With the rapid development of digital information development, digital audio data become the dominant audio media instead of analog audio signals due to the convenience in its compression and transmission. However modern audio processing techniques have made it quite easy to make modifications like tampering, replacement, rearranging etc to audio content and time sequence, which highly violates the content integrity and originality of the digital audio data. As a result, it is more important than ever to ascertain the audio data’s authenticity, especially their semantic meaning and perceptual quality.So far many audio authentication algorithm have been proposed, but they share the same drawback that the authentication system cannot work properly when the time sequence between dubious and reference audio file does not align. This is a common case when de-synchronizing attacks such as multiple cropping or inserting take place, or the dubious audio piece is a fragment out of the reference audio file.This thesis presents our works on following aspects:Firstly the existing audio authentication methods are introduced. Classified in either semi-fragile watermarking or robust audio signature method, they serve various needs in different security level. We also discuss the common drawbacks regarding the de-synchronization problem.In the research of anchor point based audio authentication method, we propose an approach that focuses on providing the justified content authentication results even one or more desynchronizing attacks take place, In particular, the audio piece are segmented by these anchor points into a series of non-overlapped authentication units to be aligned with the reference audio, and according to the authentication result given by an offline-trained supervised model, attack positions can be properly detected. Experiment shows encouraging content authentication results compared to the pervious approaches.In the research of content-based audio fragment authentication using SIFT key points, a novel authentication algorithm is proposed for the purpose of audio fragment authentication which is the typical problem of de-synchronizing time sequence. SIFT descriptor originated from computer vision field is introduced and calculated on audio spectrogram to accomplish the tasks of fragment alignment, time-domain blocking, cropping and inserting identification etc. |