Font Size: a A A

Research Of Audio Classification And Segmentation

Posted on:2005-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:L BaiFull Text:PDF
GTID:2168360155971789Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology, network technology and communication technology, multimedia data, which includs image, video and audio, becomes primary information media in the field of information processing. Audio data is an important part of multimedia applications. Raw audio data is non-semantic and non-structured binary data stream which lacks semantic content description and structured organization. These characteristics of audio data bring much difficulty for audio information processing and analysis deeply. How to extract the structure information and semantic content from raw audio data is key for audio information processing, content-based audio retrieval and video structure parsing with audio assistance. Audio classification and segmentation are key technologies for solving such problems, and are the basis of audio structured. Incorporating the past research fruits, this dissertation studies audio classification and segmentation that solves the following problems, such as audio structure analysis, features analysis and extraction for audio classification, SVM-based classifier and entropy-based audio segmentation algorithm with dynamic programming. The research work and results of this dissertation can be concluded as follows:(l)Base on the analysis on audio semantic content, audio clip is classified into six classes: silence, noise, pure-speech, non-pure speech, environment sound and music. Some structure units in different level of audio structure are defined. A new hiberarchy framework for audio structure analysis is proposed. Audio classification is essentially pattern recognition. According to the theory of pattern recognition, a framework for audio classification and segmentation is proposed, discussing the key technologies in this field. (2)Audio features are researched deeply in frame level and clip level. In order to correct the wrong classifications, four new features are proposed, including silence ratio, High-ZCR ratio, Low-frequency-energy ratio and spectrum flux. The performances of these features are evaluated in a SVM-based audio classifier that is accomplished in this dissertation.(3) Three main current SVM training algorithms are studied in detail, which are quadratic optimization algorithm, decomposition algorithm and incremental algotithm. With the comparison of the performance of thesethree classic SVM training algorithms, we prove the advantages and applicability of SVMllght algorithm over other SVM training algorithms and provide a basis for SVM-based classifier. The disadvantages of traditional rule-based classifier are analysed. A new SVM-decision method with decision tree is proposed. Based on this method, a SVM-decision-based multi-classes audio classifier is designed and accomplished which integrate advantages of rule-based classifier and SVM-based classifier.(4)The disadvantages of traditional audio segmentation methods are studied in detail. An audio segmentation algorithm based on entropy and dynamic programming is suggested and applied on the results of the clips classification. The performance of entropy-based method is evaluated with experiments, and is proved better than traditional methods.
Keywords/Search Tags:Audio Classification and Segmentation, Support Vector Machine, Feature Extraction, Audio shot, SVM-Decision Tree, Entropy
PDF Full Text Request
Related items