| Robust motion estimation algorithm has not only become an important subject in current video compression technology, but also plays a more and more important role in video processing. Because of the simple principle and the simplicity to realize, block matching algorithm has become more and more popular in all of motion vector estimation algorithm. There are increasing number of articles which are written by researchers are researching block matching algorithm. However, the full search algorithm is so inefficient that it cannot satisfy some situation. But the traditional fast block matching algorithms, although they are good at improving search speed, they always trapped in local optimal solution. This kind of defect has a great influence in the accuracy of the motion vector. In other words, traditional algorithm cannot fully meet the requirements which requires relatively higher accuracy of motion vector.In view of the advantages of block matching algorithm, this article shows the principle and the realizing steps of many block matching algorithms. Besides, this article also summarizes the method of traditional algorithm optimization. Then given the advantages and based on the traditional shortcomings of the algorithms, this article proposes an improved algorithm finally, which combine full search algorithm with spatial and temporal correlation. At first, the improved algorithm makes a full search for video sequences, then take an advantage of spatial and temporal correlation between blocks which is mentioned by 3-dimension recursive search algorithm, and this article also uses the judgment of termination to improve efficiency and optimize the motion vector. The experimental results show that the improved algorithm can not only ensure accuracy of motion vector, but also strengthen the contact of space and time between blocks, which make the motion vector more close to its real statement.Focus on the high time complexity of the improved algorithm, this article makes a lot of optimization by using the powerful parallel computation ability of GPU and the character of Open CL. Besides, this article converses every step of improved algorithm into a kernel, which can be executed on Open CL platform. According to the specific purpose and different kernel, this article designs different optimization way such as data storage, distribution of thread, reduction of data dimension. As a result, the final experimental results show that the optimized algorithm on GPU platform is several times faster than that on CPU platform, and the optimized algorithm can meet the real-time requirement.At last, this article makes a summary and outlook to the content, and predicts the next research content. |