By Stephen Neuendorffer, Thomas Li, and Fernando Martinez Vallina, Xilinx, Inc.
Imaging applications have grown in both scale and ubiquity in recent years as online pictures and videos, robotics, and driver assistance applications have proliferated. Across these domains, the core algorithms are very similar and require a development methodology that lets application developers quickly retarget and differentiate products based on markets and deployment targets.
The challenge for the developer lies in optimizing the imaging application for an execution target. By leveraging technology from Xilinx Vivado HLS, Xilinx’s SDAccel development environment makes the use of C++ libraries straightforward for OpenCL application developers targeting FPGAs.
One key characteristic of imaging applications is that they are fundamentally a set of operations on a pixel with respect to a surrounding neighborhood of pixels in space and, for some applications, in time. We therefore can think of an imaging application as a set of parallel computation tasks that a developer can execute on a CPU, GPU or FPGA.
The CPU is always the easiest target device with which to start. The code typically already runs on the CPU before optimization is considered and can leverage the wealth of available libraries. The problem with executing imaging workloads on a CPU is the achievable sustained performance.
GPUs hold the promise of much higher performance than CPUs for imaging applications because GPU hardware was purposely built for imaging workloads. Until recent years, the drawback of GPUs for general imaging applications had been the programming model. GPU programming differed from that for CPUs, and GPU models were not portable across GPU device families. That changed with the standardization of programming for parallel systems such as GPUs under the OpenCL framework.
FPGAs provide an alternative implementation choice for imaging workloads. Developers can customize the FPGA logic fabric into workload-specific circuits. The flexibility of the FPGA fabric lets an application developer leverage the performance and power consumption benefits of custom logic while avoiding the cost and effort associated with ASIC design.
As it was for the GPU, one barrier for adoption of FPGA devices has been the programming model. Traditionally, FPGAs have been programmed with a register transfer language (RTL) such as Verilog or VHDL. Although those languages can express parallelism, the level of granularity is significantly lower than what is needed to program a CPU or a GPU. As in the case of GPUs, however, adoption of the OpenCL standard to express FPGA programming in a way that is familiar to software application developers has overcome the programming model hurdle.
The OpenCL framework provides a common programming model for expressing data parallel programs. The framework, which has evolved into an industry standard, is based on a platform and a memory model that are consistent across all device vendors supporting OpenCL.
A platform for OpenCL always contains one host, which is typically implemented on a processor. The host is responsible for launching tasks on the device and for explicitly coordinating all data transfers between the host and the device. In addition to the host, a platform contains at least one device. The device in the OpenCL platform is the hardware element capable of executing OpenCL kernel code. In the case of CPU and GPU devices, the kernel code is executed on one or more cores in the device. For an FPGA, in contrast, the SDAccel development environment generates custom cores per the specific computation requirements of the application kernel.
The SDAccel development environment leverages technology from Xilinx’s Vivado HLS C-to-RTL compiler as part of the core kernel compiler, letting the SDAccel environment use kernels expressed in C and C++ in the same way as kernels expressed in OpenCL C. Application developers thus can use C++ libraries and code previously optimized in Vivado HLS to increase productivity.
Note: This blog post is an excerpt from an article appearing in the new Xcell Software Journal, which just went online. You can read the full article, which includes far more detail with an example, by downloading the PDF of this first issue here or clicking here to read the article online.