Font Size: a A A

Feature extraction for audio classification

Posted on:2004-08-04Degree:M.A.ScType:Thesis
University:Carleton University (Canada)Candidate:Abu-El-Quran, Ahmad RamiFull Text:PDF
GTID:2468390011460539Subject:Engineering
Abstract/Summary:
This thesis proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays.; The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch in each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as cocktail noise, footsteps, and traffic noise.; Two major contributions are proposed for the speech/non-speech discrimination. The pitch ratio algorithm is proposed to give robust decision for each class. Limiting the search to the human pitch range is also proposed to enhance the performance of the pitch ratio algorithm. Also, non-speech audio type classification using neural networks is proposed.; It is shown that the proposed algorithm can achieve a percentage of correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.
Keywords/Search Tags:Audio, Non-speech, Segments, Algorithm
Related items