Font Size: a A A

Research On Perceptual Audio Hashing

Posted on:2011-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H JiaoFull Text:PDF
GTID:1118360332457970Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Perceptual hashing, also refers as robust hashing and fingerprinting, maps digitalmultimedia data into a compact digital digest. Unlike cryptographic hashing, perceptualhashing is tolerant to content preserving operations and sensitive to perceptual changesof content. Different digital representations of multimedia would be mapped into thesame digest when the multimedia is of the same perceptual content, and multimedia ofdifferent contents would be mapped to distinct and statistical random hash values. Assuch, perceptual hashing finds good applications in content authentication, as well ascontent-based identification, indexing, retrieval and etc.The integrity of digital audio is essential to human property right, credibility ofpublisher, and even the national security. The research on perceptual audio hashing hasbecome an actively studied area of multimedia processing and security. Audio refers to thesound that is capable of being heard. Music and speech are two typical audio signals. Asthey are different in signal characteristics, coding scheme, transmission channel and etc.,specific perceptual hashing algorithms should be developed respectively. Music is usuallytermed as wideband audio in signal processing research area. There are four categories:raw wideband audio, compressed wideband audio, raw speech and compressed speech.At present, the research on perceptual audio hashing is still in its elementary stage.Although several algorithms have been proposed, the universal model and performanceevaluation methods, which are important for algorithm optimization and testing, are stillabsent. Moreover, most of the proposed algorithms are applied to raw wideband audio.There are some drawbacks when they are applied to compressed wideband audio. Further-more, they cannot be applied to speech authentication because of the difference betweenwideband audio and speech. This dissertation systematically summarizes the research sta-tus of perceptual audio hashing, and studies the modeling and the performance evaluationof perceptual hashing. Specific algorithms for compressed wideband audio, raw speechand compressed speech are developed respectively in this thesis. The main innovativecontributions of this thesis are as follows:(1) Based on the perception theory, the standard description of perceptual hashingare presented, including definition, technique framework and properties in mathematical form. In this thesis, perceptual hashing is modeled as a Markov information source, andthe entropy rate of the Markov source is proposed as a joint quantitative measure of per-formance evaluation. First, the proposed model and measure are independent of algorithmand suitable for black box testing. Second, entropy rate is a unit information amount andnot affected by the size of hash size. Therefore, it could be used for joint evaluation ofdiscrimination power and compactness. Third, there are upper bound and lower bound ofentropy rate. The value of entropy rate is a absolute indicator of algorithm performance,which clearly shows the distance between the tested algorithm and the optimum goal.(2) Compressed domain audio hashing algorithms are proposed in this thesis. Theperceptual hash is calculated from MDCT coefficients which are derived by partial de-coding of compressed audio bitstream. The proposed method is highly robust to MDCTbased audio compression and transcoding. There is no complicated transformation in theproposed algorithm, therefore, it is of low computational complexity. It is practical insome scenarios which have strict requirement of memory and computational overhead,such as network audio online retrieval.(3) A novel perceptual hashing for raw speech based on speech production model isproposed in this thesis. Perceptual hash is calculated based on linear spectrum frequencies(LSFs) which model the vocal tract. The hash function is key-dependent and collisionresistant. Meanwhile, it is highly robust to content preserving operations as well as havinghigh accuracy of tampering localization. Moreover, the proposed method is not limited tospeech coders, and practicable for all types of speech communication systems.(4) Speech coded at very low bitrate requires hash algorithm with high compactnessand robustness. G.729 and MELP are two typical low bit rate speech coding standards.Perceptual hashing algorithms integrated with them are proposed in the thesis. LSF couldmodel the changing shape of the speaker vocal tract and is the intermediate result ofpartial decoding. They are used to generate hash value. The proposed methods satisfythe robustness and discrimination requirement of perceptual hash with very low hash bitrate. It is also a computational efficient algorithm which could be applied to scenarioswith power restriction or real-time communication requirement.
Keywords/Search Tags:Perceptual Hashing, audio, speech, multimedia content authentication, per-formance evaluation, compressed domain algorithm
PDF Full Text Request
Related items