Newsletter

Newsletter
 

Jacket & GPUs in Radar Processing


With the advances of graphics processing units(GPUs) in recent years and their ability to process an incredible number of floating point operations per second (GFlops), particularly per watt or per dollar, interest in radar processing applications is now being targeted at GPUs in a more serious way. System Planning Corporation (SPC), a leading systems integration firm with an emphasis on superior scientific analysis, unsurpassed system engineering, and winning prototype design set out to evaluate the viability of GPUs in radar processing applications by comparing GPU performance to CPU and GPU-based solutions based on DSPs. As with all applications, the extent to which GPUs can be used to improve processing speed depends heavily on the extent to which calculations can be parallelized.

radar gpu
Radar Clutter Reducation: SPC set out to use AccelerEyes' Jacket software platform to accelerate radar processing algorithms. For one application raw data from marine navigation radars was processed using a variety of thresholding techniques to extract real targets from clutter. This involves very highly data-parallel processing in which each radar pulse is subjected to the same computations; very few operations occur across multiple pulses. Using Jacket, SPC has achieved 10x speed improvements relative to a Core i7-920 CPU and 5x improvements relative to a realtime DSP implementation.



Solution

In order to facilitate algorithm development and testing, the company implemented existing DSP-based code in MATLAB. Originally, the code was almost a line-by-line translation from the DSP's C code to MATLAB. While this ensured identical performance between the DSP and MATLAB versions of the code, it did not allow MATLAB to take advantage of its fast vector processing.

Once the MATLAB code was validated, the company invested time to "vectorize" the code. This type of processing allows many calculations to be easily vectorized such that each Pulse Repetition Interval (PRI) is operated upon as a whole rather than sample by sample. Pulse Repetition Interval (PRI) is the elapsed time from the beginning of one pulse to the beginning of the next pulse. These pulses are used in radar processing to determine the distance of objects from the source and the velocity of the target object.

Furthermore, for applications that do not have stringent data latency requirements, many calculations can be easily performed on matrices of many PRIs representing either a complete scan or a substantial portion of a scan. This vectorization in both the range and PRI dimensions greatly improves processing speed in MATLAB and makes the calculations more conducive to GPU processing.

GPU processing on this code was made possible by Jacket. Jacket is a software platform that provides just in time compilation, a run-time system, and libraries to enable MATLAB applications to access the power of GPUs to accelerate performance. Using Jacket with MATLAB and vectorizing the original C code targeted at DSPs allowed for the evaluation of a broad set of computing and GPU systems to be evaluated.

Results

The customer spent less than a week vectorizing the MATLAB code. During the vectorization process, the company used the Jacket platform to GPU-enable the code. With minimal development time, a DSP code was up and running on GPU-based technology.

To have a complete understanding of the possible solutions with and without DSPs, a series of performance tests were conducted. Results are shown in the figures below. The data processed was in the form of a twenty-scan, 768-sample-per-PRI file:

  • CentOS x86-64 Linux, Telsa C1060 GPU - Core i7-920 @ 2.67 GHz system running the CentOS x86-64 distribution of Linux. Calculations are vectorized such that each scan is processed as a whole. Most calculations have been moved to the Tesla C1060 GPU.
  • CentOS x86-64 Linux, GeForce 9800 GT GPU - Core i7-920 @ 2.67 GHz system running the CentOS x86-64 distribution of Linux. Calculations are vectorized such that each scan is processed as a whole. Most calculations have been moved to the GeForce 9800 GT GPU.
  • Win7-64bit, Tesla C1060 GPU - Core i7-920 @ 2.67 GHz system running the 64-bit Windows 7. Calculations are vectorized such that each scan is processed as a whole. Most calculations have been performed on the Tesla C1060 GPU.
  • Win7-64bit, GeForce 9800 GT GPU - Core i7-920 @ 2.67 GHz system running the 64-bit Windows 7. Calculations are vectorized such that each scan is processed as a whole. Most calculations have been performed on the GeForce 9800 GT GPU.
  • Win7-64bit, GeForce 210 GT GPU - Intel E6500 @ 2.93 GHz system running the 64-bit Windows 7. Calculations are vectorized such that each scan is processed as a whole. Most calculations have been performed on the GeForce 210 GPU.
  • CentOS x86-64 Linux, Core i7-920 CPU, Vectorized - Core i7-920 @ 2.67 GHz system running the CentOS x86-64 distribution of Linux. Calculations are vectorized such that each scan is processed as a whole.
  • Win7-64, Core i7-920 CPU, Vectorized - Core i7-920 @ 2.67 GHz system running the 64-bit Windows 7. Calculations are vectorized such that each scan is processed as a whole.
  • Win7-64bit, E6500 CPU, Vectorized- Intel E6500 system running the 64-bit Windows 7. Calculations are vectorized such that each scan is processed as a whole.
  • WinXP32, Core 2 Duo T7100 CPU - Lenovo T61 laptop (1.79 GHz), Core2 Duo system running the 32-bit Windows XP. Most calculations are vectorized such that each scan is processed as a whole.


gpu runtime

Also contained in figure 3 is a vertical black line at 2400 Hz. This is typical of the maximum PRF for navigation radars. Therefore, in order to keep up with dataflow, maximum data processing rates must be above this line. All three GPU configurations satisfy this condition; the non-GPU Linux configuration does as well but with very little margin. Other non-GPU configurations do not pass the 2400 Hz threshold. By keeping up the dataflow, it is possible for GPUs to deliver solutions for radar processing with attention paid to latency requirements.

Observations

The data in Figure 2 and Figure 3 leads to the following observations:

  • With both CPU-only and GPU-enabled calculations, Linux is much faster than in Windows. Line-by-line comparisons show that for time-consuming calculations, Linux is 25% to 50% faster than Windows.
  • In either operating system, the performance difference between GPUs does not seem commensurate with their differing performance specifications. The amount and types of calculations performed by the GPU are simple and therefore do not take full advantage of the additional capabilities provided by the Tesla C1060.
  • In either operating system, the difference in performance between the CPU and GPU are relatively small compared to published/advertised GPU capabilities. Two primary reasons explain why the GPU does not provide more speed improvement vs. the CPU. First, the amount and types of calculations performed by the GPU are simple and are performed quite efficiently by the CPU. Second, GPU speed could likely be improved by more effectively employing the GPU. In the current approach, the GPU computation times are likely dominated by overhead associated with passing data back and forth between the GPU and CPU.
  • Large differentials in performance between Tesla and GeForce GPUs and CPUs and GPUs in Linux versus in Windows may indicate that there is more overhead associated with Jacket in Windows when compared to Linux. Larger overhead (which should be independent of the graphics card used) masks performance differences between the two.


Conclusions

Although some users require rugged solutions and millisecond latencies to fully consider GPU-based solutions as alternatives to DSP-based offerings,the lower cost configurations outlined in this study offer real potential for many other users. This simple GPU test and benchmarking effort showed that the low-end Core i7 augmented by a low-cost GPU has the potential of performing radar processing in realtime. Furthermore, additional GPU speed enhancements were achieved by rewriting the processing code to better utilize the GPU resources. Each of these results will help define future solutions for radar processing.


The Authors

  • System Planning Corporation - website
  • Gary Rubin and Dave Berger



« Back to Case Studies