Font Size: a A A

End-to-End Image And Video Compression

Posted on:2020-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:T Y HeFull Text:PDF
GTID:2428330572987268Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Media content such as images and videos are widely adopted in the field of ad-vertisement,medicine,entertainment,security,etc.While the applications of ultra-high definition,immersive experience and intelligent media services have dramatically increased the challenge of transmission and storage.In the past several decades,the video coding performance improves around 50%every 10 years under the cost of in-creased computational complexity and memory.For the time being,on the one hand,it is increasingly difficult to further improve the coding efficiency since the latest coding standard is quite complex.On the other hand,it encountered great challenges to further significantly improve the coding efficiency and to deal efficiently with novel sophis-ticated and intelligent media applications such as recognition,object tracking,image retrieval,etc.So there exist strong requirements for both academic and industrial to explore new video coding directions and frameworks as potential candidates for future video coding schemes,especially considering the outstanding development of machine learning technologies.This thesis focuses on end-to-end image and video compression method according to various requirements.The main contributions and innovations of the thesis can be summarized as follows:(I)This thesis proposes an end-to-end facial image compression framework.Dif-ferent from traditional codecs that heuristically adjust compression parameters accord-ing to embedded quality metrics,it is feasible to automatically optimize coding param-eters according to gradient feedback from the integrated hybrid facial image distortion metric calculation module.The experimental results verify the framework's efficiency by demonstrating performance improvement of 71.41%and 48.28%bit-rate saving sep-arately over JPEG2000 and WebP.(2)This thesis introduces the concept of Semantically Structured Bit-stream(SSB).Different from traditional media coding scheme codes the media into one binary stream without a semantic structure,each part of SSB represents for certain object.Experi-mental results demonstrate that objects can be completely reconstructed from partial bit-stream.It also be verified that classification and pose estimation can be directly performed on partial bit-stream without performance degradation.(3)This thesis presents PixelMotionCNN(PMCNN),which includes motion ex-tension and hybrid prediction networks,can model spatiotemporal coherence to effec-tively perform predictive coding inside the learning network.Although entropy coding and complex configurations are not employed in this thesis,experimental results still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec.(4)This thesis proposes a brand-new end-to-end video compression framework,named Memorize-Then-Recall(MTR).Specifically,this thesis decomposes surveil-lance video signals into the structure of a global spatiotemporal feature(memory)and skeletons for each frame(clues).While decompression,based on the exploration on at-tention mechanism in the field of Neural Machine Translation(NMT),attention mech-anism is introduced to realize the reconstruction process.
Keywords/Search Tags:Image Compression, Video Coding, End-to-end, Neural Network, Atten-tion Mechanism, Semantic Fidelity, Semantically Structured Bit-stream
PDF Full Text Request
Related items