We use FPGAs to implement the dataflow graphs of algorithms. Dataflow computing runs every step of an algorithm in parallel and resembles a system of pipelines with synchronous operations. It maps well to FPGA hardware and can speed up algorithms for HPC by one or two orders of magnitude compared to general purpose hardware.
Localization microscopy enhances the resolution of fluorescence light microscopy by about an order of magnitude. The fluorophores are switched between two different spectral states, e.g. they blink between bright and dark. These otherwise inseparable signals can then be isolated and the centre of each signal is determined with increased accuracy. Finally, the obtained positions are plotted in a new image with an improved resolution. The process can easily take up hours on standard hardware for a five-minute recording.
We use a pipelined description of the algorithms that find and fit the signals of the optical markers. Mapping the algorithm to FPGA hardware showed huge improvements in speed and made computational analysis shorter in time than the recording. Rewriting the algorithm yielded an acceleration factor of 100, and using the tools and FPGA card from Maxeler Technologies as an application accelerator gave us another factor of 225.
These publications were supported by Maxeler Technologies.
3D tomography is a technique where the density distribution of a volume is reconstructed by taking images of the sample from different angles and calculating back to the volume from the obtained 2D projections. Electron tomography uses electrons rays inside an electron microscope to image the sample and can therefore achieve higher resolution that light microsopy.
Our group is currently analysing to which extend image reconstruction can benefit from dataflow computing on FPGAs. We are implementing SART with both forward and back projection on FPGAs as an application accelerator.
FPGAs have a long history in data processing for High Energy Physics covering the handling of low level protocols up to computing first event building tasks. Every new FPGA generation comes with an increased device size, and as a consequence a larger number of more complex algorithms can be implemented in hardware. Until now, these algorithms are described using low level hardware description languages like VHDL or Verilog.
These languages have been proven to be well suited for the description of interface blocks like PCIe, DRAM controllers or serial optical links. However, development is expensive for processing algorithms based on dataflows. Complex pipeline architectures with hundreds of stages easily lead to code that is hard to read, and optimized processing modules with differing latencies cannot be integrated easily. Maintenance and modification of this kind of code is a complex task.
Utilization of code generation techniques from higher-level dataflow based frameworks in High Energy Physics can reduce the design effort of developing firmware dramatically while producing more efficient hardware.
Our aim is to investigate the benefits for FPGA firmware when using a higher level framework and compare the results to manually-written hardware descriptions.
We like to thank Maxeler Technologies for their generous hardware support as a member of the Maxeler University Program MAX-UP.
Machines in use: