### 2D DWT Image Compression on the GPU

This work demonstrates acceleration of 2D DWT compression of medical images using CUDA capable GPUs and the Jacket software platform for MATLAB® . Recent development trends have contributed to the widespread adoption of medical imaging as an important tool of healthcare diagnostics [1]. There is a huge volume of data generated using acquisition modalities like computer tomography (CT), magnetic resonance imaging (MRI), positron emission tomography or nuclear medicine, and therefore, a common need is to manipulate and transmit this data using compression techniques in as little time as possible.

Wavelet based image compression allows for multi-resolution analysis of data meaning it can be applied to different scales according to the details required and avoids the need of extra storage. Another encouraging feature of wavelet transform is its symmetric nature, i.e. both the forward and the inverse transform have the same complexity, and allows building of fast compression and decompression routines. Its characteristics are well suited for image compression which includes its ability to take into account Human Visual System's (HVS) characteristics, very good energy compaction capabilities, robustness under transmission, high compression ratio etc. This work demonstrates the acceleration of a 2D image compression algorithm using haar wavelets [2] on the GPU that is much faster compared to its CPU variant. In comparison to previous efforts to perform 2D DWT Image compression on the GPU [3], the authors have obtained better performance on the GPU for different levels of decomposition and image reconstruction.

### 2D Image Compression using Haar wavelets

Haar wavelet based image compression of 2D images involves the following three steps:

- Compute the transformed matrix by the operation averaging and differencing (First for each row, then for each column) on the input image.
- Choose a threshold method and apply that to find the new matrix say D.
- Use D to compute the compression ratio and to reconstruct the original image as well.

The quality of reconstructed image depends on the levels of decomposition employed and the thresholding method used. The levels of decomposition and thresholding method used are based on the assumption that due to imperfection of human visual system some amount of information may be discarded. As it is demonstrated in the figure below, reasonable degree of lossy compression still warrants usable results for diagnostic purposes.

2D Haar wavelet-based image compression in MATLAB is performed using
MATLAB's built-in *wavedec2* and *wdencmp*
routines. *wavedec2* is a two-dimensional wavelet analysis function
which returns the wavelet decomposition of the matrix at level 'N', using
a wavelet which in this case is Haar. *wdencmp* is a one- or
two-dimensional de-noising and compression-oriented
function. *wdencmp* performs a de-noising or compression process of a
signal or an image, using Haar wavelets. It returns a de-noised or
compressed version XC of input signal X (one- or two-dimensional) obtained
by wavelet coefficients thresholding using global positive threshold.

### Accelerated 2D HAAR Wavelet Image Compression on GPU

The acceleration of the image compression algorithm was performed by
writing two functions: *wavedec2GPU* and *wdencmpGPU* that mimic
the MATLAB built-in routines but are executed on the
GPU. *wavedec2GPU* performs the nth level decomposition of an input
image using Haar wavelets on the GPU. *wdencmpGPU* performs hard
thresholding using a global threshold supplied by the user and then
reconstructs the image using the decomposed detail and approximate
coefficients determined using *wavedec2GPU*. To perform the next
level of decomposition, sub-reference and sub-assignment operators for GPU
data types in Jacket were used.

In the wavelet decomposition function (*wavedec2GPU*), the number of
thread blocks is equal to the number of rows of the input image with each
block composed of half the number of columns. Thus for a 512x512 image,
there are 512 blocks with each block composed of 256 threads in which each
thread computes the sum and difference of adjacent pixels. The same
procedure is then performed on each row thus completing one level of
decomposition. To perform the next level of decomposition, the approximate
coefficients are supplied as input to the *wavedec2GPU* function.

The following table shows the speed-up obtained comparing performance of
MATLAB built-in *wavedec2* and *wdencmp* functions and their GPU
variants (*wavedec2GPU* and *wdencmpGPU*) implemented
using Jacket
SDK and its capability to handle GPU data types. The Nth level
decomposition is performed on a sub-region of processed image using Jacket
sub-reference and sub-assignment functions. Tests were performed on
512x512 head scan CT images for varying decomposition levels. Running on
NVIDIA TESLA C1060 and comparing with the Intel Xeon CPU at 2.5 GHz and
4GB RAM, Jacket **achieves 38x speedup**, versus CPU-based MATLAB. The
2D DWT image compression technique is suitable for real time compression,
transmission and faster decompression with results usable for diagnostic
purposes and high compression ability.

Hard thresholding: 12.344

### The Authors

**
Indian Institute of Technology, Roorkee **

- Jaideep Singh
- Ipseeta Aruni
- Dr. R. Balasubramanian

### References

- [1] Semmlow, J. L.: Bio signal and Biomedical Image Processing, Marcel Dekker Inc., ISBN 978-0-471-76777-0
- [2] http://en.wikipedia.org/wiki/Haar_wavelet
- [3] Simek, V.; Asn, R.R., "GPU Acceleration of 2D-DWT Image Compression in MATLAB with CUDA," Computer Modeling and Simulation, 2008. EMS '08. Second UKSIM European Symposium on , 2008, pp. 274 - 277

« Back to Case Studies