| Since entering the 21 st century,the research and application of computer vision is in the ascendant.The visual object tracking technology has been successfully applied in video surveillance,security,drones,driverless cars and other fields.Therefore,the requirement of tracking accuracy and algorithm speed of the object tracking algorithm in the visual tracking system is increasing day by day.This dissertation focuses on object tracking algorithms and GPU-based parallel computing.This paper firstly reviews the advanced object tracking algorithms both at home and abroad.The kernelized correlation filtering algorithm has strong robustness against motion and illumination,but has poor robustness to deformation and rotation,while the distractor aware tracking algorithm is just the opposite.In order to combine their complementary advantages,a parallel tracking algorithm is proposed in which the distractor aware tracking algorithm and the kernelized correlation filtering algorithm are firstly connected in series and parallels with the kernelized correlation filter algorithm.At the same time,in order to make up for the shortage of the parallel tracking algorithm to deal with scale variation,a scale filter was added to scale prediction.Using the standard test set OTB-100 for testing,and taking the tracking results of 30 sequences with specific tracking difficulties to compare;experimental results show that the parallel tracking algorithm is superior to the central location error index of tracking success rate and tracking accuracy two sub-algorithms;and effectively combined the complementary advantages of both,can simultaneously handle the scale variation,rotation,movement and other tracking difficulties.Then,in order to improve the calculation speed of the object tracking algorithm,according to the hardware features of the GPU parallel computing speed,the parallel algorithm designed for the GPU’s Compute Unified Device Architecture(CUDA)is optimized and designed.For the candidate frame score calculation module,a one-dimensional thread is used to directly parallelize the original process;for the non-maximum suppression module,the workflow is redesigned,and the original serial loop structure is converted into a fractional calculation suitable for parallel.Calculate the three steps of the most value and the global comparison,and use the parallel reduction method to accelerate the calculation of the maximum value;for the FHOG feature extraction module,divide the pixel gradient calculation,unit energy statistics,and unit gradient histogram calculation according to the program flow.The steps are parallelized on the basis of pixels or units;for the kernel-related calculation module,the steps are divided into three steps of summing the square sum,the dot product and the negative exponential of the corresponding elements of the two three-dimensional data,and the data is regarded as a multi-channel matrix.Corresponding to the organization of two-dimensional or three-dimensional threads,so that each thread to complete the corresponding calculation.Finally,on the CUDA platform of the GPU,the verification experiment was completed for the optimization design of the parallel tracking algorithm.For each of the four modules,the running time and the speedup ratio on the CPU and GPU are listed,and the changing trend and influencing factors of the speedup ratio of each module are summarized.The verification experiment was completed for the parallel tracking algorithm as a whole,achieving an acceleration ratio of up to 3.6 times. |