| Compared with H.264 standard,H.265/HEVC video codec standard not only inherits the excellent features of H.264 codec standard,but also introduces many new technologies,including flexible image quadtree partitioning,multi-angle intra-frame prediction,etc.These new technologies not only ensure the quality of video codec,but also reduce the code rate by half.However,compared to H.264,the processing complexity of the HEVC standard has increased by nearly two to three times.In addition,because HEVC is designed for high-definition and ultra-high-definition video encoding and decoding,it will bring more workload than ordinary video,so it provides multiple levels of parallel encoding methods.At the same time,the development of multi-core processor technology also provides hardware basis for the implementation of parallel codec.Therefore,parallel codec technology based on HEVC codec standard has become a research topic for many domestic and foreign scholars.This thesis takes the Tilera-Gx36 multicore processor with 36 cores as the research platform,and studies the parallel processing technology and implementation of the main modules of HEVC decoding based on multicore processor.The main contents of this thesis are as follows:1.Design and implement a multithreaded load balancing based pixel decoding reconfiguration algorithm,which implements multithreaded parallel decoding of the pixel decoding reconfiguration module.Due to the texture characteristics of the image area and the uncertainty of CTU quadtree partitioning,the decoding complexity of each CTU is different,that is,the decoding time is different.If the current thread cannot decode the current CTU,then the CTU dependent on the current CTU needs to wait for the current CTU decoding to complete.At this time,the decoding CTU occupies the thread resources,causing a waste of multi-core resources.To solve this problem,a CTU complexity estimation algorithm is proposed.The decoding complexity of CTU is determined by the depth of partition of CTU and the partition of CU to PU.The CTU groups with the same complexity are allocated to multiple threads for parallel processing to achieve multithreaded load balancing in pixel decoding reconstruction.2.A parallel decoding algorithm based on data structure optimization is designed and implemented.The decoding status of CTU is one of the most accessed data during decoding process.Ordinary array storage is not conducive to fast multi-threaded reading and writing.To solve this problem,a scheme using Bit Map to record the decoding status information of CTU is designed in the decoding process,which saves storage space and improves the storage query efficiency.Quadtree partitioning is a new CTU partitioning method introduced by HEVC,which establishes enhanced quadtree data structure at the data structure level.Only the leaf nodes of the quadtree store decoding information,and read the CTU decoding status information stored in the Bitmap before decoding,and then decode the current CTU.This enhanced quadtree data structure saves storage space for nonleaf nodes and increases the efficiency and stability of data indexing.At the same time,a series of methods such as storage and query are implemented to further improve the efficiency of data reading and writing.3.A parallel decoding scheme based on read-write lock data-level and task-level fusion is designed and implemented.By introducing read-write locks in data structures such as Bit Maps and quadtrees,multiple threads can read shared data simultaneously,and reading decoded information by multiple threads can cause resource conflicts.Combined with the characteristics of Tilera multi-core platform,the decoding efficiency is further improved based on the parallel decoding algorithm of data-level and task-level fusion.All the designs are programmed on the Tilera-Gx36 multi-core processing platform,using the latest libde265 as the basic reference software,to test the decoding of multiple video sequences including high-definition and ultra-high-definition.Based on the experimental results,the proposed multi-threaded load balancing based pixel decoding reconfiguration algorithm improves the parallel acceleration ratio by about 7.9% on average compared with the previous HEVC parallel decoding algorithm based on core module fusion.The parallel decoding scheme based on data structure optimization improves the parallel decoding speed ratio by 7.9% compared with OWF-based and CTU-based HEVC intra/inter-frame fusion parallel decoding algorithms.About 22.9% and 3.4%;the read-write lock based parallel decoding algorithm with data-level and task-level fusion improves the parallel acceleration ratio by an average of about 11.2% over the core module fusion based parallel decoding algorithm with HEVC. |