2D DWT Image Compression on the GPU

This work demonstrates acceleration of 2D DWT compression of medical images using CUDA capable GPUs and the Jacket software platform for MATLAB® . Recent development trends have contributed to the widespread adoption of medical imaging as an important tool of healthcare diagnostics [1]. There is a huge volume of data generated using acquisition modalities like computer tomography (CT), magnetic resonance imaging (MRI), positron emission tomography or nuclear medicine, and therefore, a common need is to manipulate and transmit this data using compression techniques in as little time as possible.

Wavelet based image compression allows for multi-resolution analysis of data meaning it can be applied to different scales according to the details required and avoids the need of extra storage. Another encouraging feature of wavelet transform is its symmetric nature, i.e. both the forward and the inverse transform have the same complexity, and allows building of fast compression and decompression routines. Its characteristics are well suited for image compression which includes its ability to take into account Human Visual System's (HVS) characteristics, very good energy compaction capabilities, robustness under transmission, high compression ratio etc. This work demonstrates the acceleration of a 2D image compression algorithm using haar wavelets [2] on the GPU that is much faster compared to its CPU variant. In comparison to previous efforts to perform 2D DWT Image compression on the GPU [3], the authors have obtained better performance on the GPU for different levels of decomposition and image reconstruction.

2D Image Compression using Haar wavelets

Haar wavelet based image compression of 2D images involves the following three steps:

  • Compute the transformed matrix by the operation averaging and differencing (First for each row, then for each column) on the input image.
  • Choose a threshold method and apply that to find the new matrix say D.
  • Use D to compute the compression ratio and to reconstruct the original image as well.

The quality of reconstructed image depends on the levels of decomposition employed and the thresholding method used. The levels of decomposition and thresholding method used are based on the assumption that due to imperfection of human visual system some amount of information may be discarded. As it is demonstrated in the figure below, reasonable degree of lossy compression still warrants usable results for diagnostic purposes.

2D Haar wavelet-based image compression in MATLAB is performed using MATLAB's built-in wavedec2 and wdencmp routines. wavedec2 is a two-dimensional wavelet analysis function which returns the wavelet decomposition of the matrix at level 'N', using a wavelet which in this case is Haar. wdencmp is a one- or two-dimensional de-noising and compression-oriented function. wdencmp performs a de-noising or compression process of a signal or an image, using Haar wavelets. It returns a de-noised or compressed version XC of input signal X (one- or two-dimensional) obtained by wavelet coefficients thresholding using global positive threshold.

Accelerated 2D HAAR Wavelet Image Compression on GPU

The acceleration of the image compression algorithm was performed by writing two functions: wavedec2GPU and wdencmpGPU that mimic the MATLAB built-in routines but are executed on the GPU. wavedec2GPU performs the nth level decomposition of an input image using Haar wavelets on the GPU. wdencmpGPU performs hard thresholding using a global threshold supplied by the user and then reconstructs the image using the decomposed detail and approximate coefficients determined using wavedec2GPU. To perform the next level of decomposition, sub-reference and sub-assignment operators for GPU data types in Jacket were used.

In the wavelet decomposition function (wavedec2GPU), the number of thread blocks is equal to the number of rows of the input image with each block composed of half the number of columns. Thus for a 512x512 image, there are 512 blocks with each block composed of 256 threads in which each thread computes the sum and difference of adjacent pixels. The same procedure is then performed on each row thus completing one level of decomposition. To perform the next level of decomposition, the approximate coefficients are supplied as input to the wavedec2GPU function.

The following table shows the speed-up obtained comparing performance of MATLAB built-in wavedec2 and wdencmp functions and their GPU variants (wavedec2GPU and wdencmpGPU) implemented using Jacket SDK and its capability to handle GPU data types. The Nth level decomposition is performed on a sub-region of processed image using Jacket sub-reference and sub-assignment functions. Tests were performed on 512x512 head scan CT images for varying decomposition levels. Running on NVIDIA TESLA C1060 and comparing with the Intel Xeon CPU at 2.5 GHz and 4GB RAM, Jacket achieves 38x speedup, versus CPU-based MATLAB. The 2D DWT image compression technique is suitable for real time compression, transmission and faster decompression with results usable for diagnostic purposes and high compression ability.

Haar wavelet 4th level decomposition
Hard thresholding: 12.344
             Original Head - Scan CT Image                                                                   Reconstructed Image - Compression: 3.0445

The Authors

Indian Institute of Technology, Roorkee

  • Jaideep Singh
  • Ipseeta Aruni
  • Dr. R. Balasubramanian


  • [1] Semmlow, J. L.: Bio signal and Biomedical Image Processing, Marcel Dekker Inc., ISBN 978-0-471-76777-0
  • [2]
  • [3] Simek, V.; Asn, R.R., "GPU Acceleration of 2D-DWT Image Compression in MATLAB with CUDA," Computer Modeling and Simulation, 2008. EMS '08. Second UKSIM European Symposium on , 2008, pp. 274 - 277

« Back to Case Studies