Parallel programming is well known to solve complex software problems meant to process large amount of data by bringing more computer resources on the table. Parallel programming can exploit the computational power present on the underlying HW. Computational power can be taken from the dedicated hardware, CPU or/and GPU. Camera products will usually have ISP (image sensor processor) for camera image sensor that contains many complicated functions. Different functions in the ISP chain are designed to handle high-quality images which involves lot of floating point mathematical operations. These operations are parallel in nature, in other words data is not interdependent thus this is an ideal use case for potential use of parallel programming. VLIW (very long instruction words) and SIMD (single instruction multiple data) are two good options for exploiting parallelism in the field of ISP chain.


Image sensors are used in image capturing devices such as camera, camcorders, CCTV cameras etc. Besides traditional uses of storage, images are now used for various authentication, security and for many other goals. Therefore, role of image enhancement is more important than before. Enhancing the image quality of raw capture is done using ISP algorithms such as denoising, white balancing and many other functions. Latest ISP algorithms that include iterations with selections from the environment adaption produce excellent image quality.

ISP algorithms can be executed on dedicated hardware or even general-purpose hardware such as CPU. Though a dedicated hardware will produce high quality image with high efficiency, but performance cannot be scaled. Whereas running on general purpose HW gives scope of efficiency and flexibility.

Before ISP algorithms can be parallelized, it is important to divide the source code into data processing part and the control processing part so that instructions set like VLIW and SIMD can be fully utilized to execute parallel parts of the algorithms. Control part of the ISP algorithms runs in the scaler mode, but parallel part runs in SIMD or VLIW mode.

VLIW mode uses instruction level parallelism to control the parallel part of the algorithm. Control and parallel part both can be executed simultaneously however more is the control part in the algorithms, more would be the barriers that limits the parallelism of the algo. Therefore, ISP algorithms should be modified accordingly to have as less control instructions as possible.

For porting of ISP algorithms on SIMD architectures, arm Neon technology is an advanced Single Instruction Multiple Data (SIMD) architecture extension for the A-profile and R-profile processors. The SIMD commands are composed of ADD, SHF (shift), CLIP, MUL (multiply) and ADD, and MUL and SHF functions.