HEVC/H.265 adopts a lot of the latest compression coding techniques to improve the compression efficiency.However,with the increase of encoding performance,the computational complexity in HEVC/H.265 also increases sharply,which makes the software/hardware implementation difficult.In software,the number of prediction modes increases to 35,and the best CU partition structure should also be determined in the prediction encoding.Therefore,the computational complexity in the prediction encoding is very high and its encoding time increases greatly.In hardware,the transform coding in HEVC/H.265 supports several transforms in multiple and larger sizes.With the increase of transform size,the number of multiplications also raises dramatically,which results in the waste of hardware resource.To solve the problems above,this dissertation has studied on the prediction and transform coding in HEVC/H.265.By analyzing the texture complexity,different initial positions and traversal orders in the prediction encoding are selected,which can reduce the encoding time of HEVC/H.265.In addition,the hardware design method of multiple multipliers is studied to obtain an efficient HEVC/H.265 transform circuit.The works and contributions in this dissertation are listed as follows:(a)The selection process of best CU partition structure is optimized and a fast CU partition structure decision algorithm for HEVC/H.265 intra prediction coding is proposed.This algorithm selects different initial positions and traversal orders by analyzing the texture complexity,which can decrease the number of iterations to reach the best CU partition structures.In addition,by analyzing the situation after each iteration,this algorithm tries to determine whether the traversal operation needs to be terminated,which can reduce the number of iterations in the prediction encoding.Based on HM15.0,the proposed algorithm is accomplished,and applied to encode the test sequences.Experimental result shows that the proposed algorithm can reduce 31.14% encode time with the similar compression efficiency,which can improve the encode efficiency greatly.(b)The hardware design method of multiple multipliers is studied and a low-redundancy multiple multipliers implementation framework is proposed.In this framework,the multiplier vector is decomposed into a product of a “private”matrix and a “public”vector,so the “public”operations among different multipliers can be extracted and achieved together.Therefore,the redundant operations in the hardware implementation can be removed.However,the best decomposition result is difficult to obtain.In order to verify the validity of the proposed framework,an exhaustive based optimization method is proposed.With this method,the transform coefficients of HEVC/H.265 transform coding in different sizes can be decomposed.Therefore,the proposed framework can be applied for the implementation of HEVC/H.265 transform coding and reduce the redundant operations.However,there are two drawbacks in the exhaustive based optimization method.The performance of the decomposition result is less desirable,and the regularity of the circuit can not be ensured.(c)The decomposition of multiplier vector is reexamined to overcome the drawbacks in the exhaustive based optimization method.An vector decomposition algorithm based on over-complete basis is proposed.The proposed algorithm adds constraint on the coefficients after decomposition,and a greedy based optimization method is obtained.In this optimization method,all the elements can be achieved with regular structures,which can ensure the whole implementation regular.In addition,the transform coefficients of HEVC/H.265 transform coding in different sizes can be decomposed with the proposed method.Experimental result shows that the redundant operations in HEVC/H.265 transform implementation can be reduced.Therefore,the proposed method can be applied in the hardware design of HEVC/H.265 transform to achieve an efficient and regular architecture.(d)A low-redundancy HEVC/H.265 transform implementation is proposed based on the existing PBMM structure.The proposed implementation reduces the redundant operations in the PBMM with the proposed low-redundancy implementation framework.We also evaluate the proposed implementation by Synopsys DC with TSMC 0.138)CMOS technique.Experimental result shows that compared with the existing implementation,the proposed implementation needs less hardware resource and is more regular.In addition,the proposed implementation can meet the real-time requirements. |