By Jayashree Rangarajan, Fernando Martinez Vallina, and Vinay Singh, Xilinx, Inc.
Six years ago, Xilinx began a diligent R&D effort to break down this barrier by creating a development environment that brought an intuitive software development design loop to FPGAs. The Xilinx SDAccel development environment for OpenCL C, C and C++ enables application compile, debug and optimization for FPGA devices in ways similar to the processes used for CPUs and GPUs, with the advantage of up to 25x better performance/watt for data center application acceleration.
COMPILE: After the functionality of the median filter has been captured in a programming language such as Open-CL C, the first stage of development is compilation. On a CPU or GPU, compilation is a necessary and natural step in the software design flow. The target ISA is fixed and well known, leaving the programmer to worry only about the number of available processing cores and cache misses in the algorithm. FPGA compilation is more of an open question: At compilation time, the target ISA does not exist, the logic resources have yet to be combined into a processing fabric and the system memory architecture is yet to be defined.
The compiler in the SDAccel development environment provides three features that help programmers tackle those challenges: automatic extraction of parallelism among statements within a loop and across loop iterations, automatic memory architecture inference based on read and write patterns to arrays, and architectural awareness of the type and quantity of basic logic elements inside a given FPGA device.
DEBUG: An axiom of software development is that application compilation does not equal application correctness. It is only after the application starts to run on the target hardware that a programmer can start to discover, trace and correct errors in the application—in other words, debug.
CPU application debug is a well-understood problem, with a multitude of tools from both commercial vendors and the open-source community available to address it. Once again, FPGAs are another story. How does an application programmer debug something that was created to implement the functionality of a piece of code at a given performance target?
The SDAccel development environment addresses this question by borrowing two concepts from the CPU world: printf and GDB debugging. The printf function is a fundamental tool in the software programmer’s toolbox. It is available in every programming language and can be used to expose the state of key application variables during program execution.
In the case of FPGAs, the implementation of printf can potentially consume logic resources that could otherwise be used for implementing algorithm functionality. The printf implementation in the SDAccel environment provides the functionality without consuming additional logic resources. The environment achieves this by separating printf data generation from the decoding and user presentation layers. In terms of hardware resources, the generation of data for printf consumes a few registers—a negligible cost in the register-rich FPGA fabric. Data decoding occurs in the driver to the FPGA. By leveraging the host CPU to execute the data decode and presentation layers for printf, a software programmer can use printf with virtually zero cost in FPGA resources.
The second technique for debugging borrowed from CPUs is the use of tools such as the GNU Project Debugger (GDB) to include breakpoints and single stepping through code. Programmers can use the SDAccel environment’s emulation modes to attach GDB to a running emulation process.
OPTIMIZE: The principles behind application optimization on an FPGA are the same as on a CPU; the difference is in the approach. For a CPU, application code is massaged to fit into the boundaries of the cache and arithmetic units of a processor. In an FPGA, the computation logic is custom assembled for the current application. Therefore, the size of the FPGA and the application’s target performance dictate the optimization constraints. The compiler in the SDAccel environment automatically optimizes the compute logic.
The design loop created by the operations of compile, debug, and optimize is fundamental to software development flows. The SDAccel development environment enables this design loop with tools and techniques similar to the development environment on a CPU, with FPGA-based application acceleration of up to 25x better performance per watt and with a 50x to 75x latency improvement.
Note: This blog post is an excerpt from an article appearing in the new Xcell Software Journal, which just went online. You can read the full article, which includes far more detail with an example, by downloading the PDF of this first issue here or clicking here to read the article online.