Research On Label-minimized Audio Classification And Sentence Segmentation

Posted on:2011-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhao

Full Text:PDF

GTID:2178330332463513

Subject:Computer software and theory

Abstract/Summary:

Automatic building of voice database is of particular importance for speech synthesis. It requires distinguishing the category of input audio for different treatment, and segmenting the processed audio into sentences, which is taken as the input of following automatic syllabic segment cutting system. Audio classification and sentence segmentation are the key technologies to solving these problems. In addition, methods proposed of audio classification and sentence segmentation require a great quantity of manual label data to train the model and test the results, which is expensive, time-consuming and laborious to prepare, largely increased the cost of system construction. Due to this, research on label-minimized audio classification and sentence segmentation has high research value and application usage. Therefore, this thesis studies the topic of the content-based audio classification and sentence segmentation without speech recognition in depth and systematically, including feature selection, label minimizing, the key technology improvements and the related application. The detailed research works in this thesis are as follows.(1)The main sources of audio information and semantic content of audio are deeply analyzed and based on the characteristics of news broadcasting audio adopted, audio clip is classified into three classes:pure speech, pure music and speech mixed with music. Based on the deeply research of distinguishable characteristics of audio features in frame level and clip level, apart from basic features such as frequency energy, zero-crossing rate, MFCCs and so on, new features are introduced, including silence ratio and High ZCR ratio and Low frequency energy ratio. The first innovations of thesis is that through in-depth analysis on advantage of collaborative training algorithm co-training, in minimizing the amount of label data and guaranteeing the classification accuracy, the co-training algorithm based on maximum entropy (Maxent) is used for audio classification. Experimental results demonstrate the performances of co-training in the audio classification.(2)To implement the label-minimizing, the co-training algorithm based on maximum entropy classifier is studied in detail. Co-training is the core to realize the label-minimizing, through contrasting the effect of different parameter settings on the classification accuracy and comprehensive analysis of the cost of time and computation, the optimal set of parameters is determined. Meanwhile, the classification way of Maxent is adjusted for the numerical classification of audio classification and sentence segmentation. Experimental results prove the performances of co-training in binary classification and minimizing the amount of label data, which provides a solid foundation to the implementation of label-minimized audio classification system and sentence segmentation system.(3)Based on in-depth analysis of the shortage of sentence segmentation methods which rely heavily on the results of speech recognition, and research on the important role of prosodic features to sentence segmentation, the semantic sentence segmentation is performed on audios, by doing vowel/consonant/pause (V/C/P) classification to audios in the frame level and using prosodic features, pause features and rate of speed (ROS) as two feature sets. A label data generating approach with checking mechanism, based on forced alignment and speech recognition, is introduced to provide label data automatically and make sentence segmentation label-free. In addition, Maxent-based co-training is executed to solve the problem of insufficient label data and realize the sentence boundary detection without manual label and speech recognition. At last, a checking mechanism is proposed to solve the problem that it can not to make certain the boundary detected is a real sentence boundary or not, by contrasting the proportion of vowels on text with that on audio data after V/C/P classification. It can pick out the real sentence boundaries from boundaries detected form co-training, which can be used in following process and system directly. The second innovations of thesis is the realization of zero manual label to sentence segmentation, and the checking mechanism which can...

Keywords/Search Tags:

label-minimized, audio classification, sentence segmentation without speech recognition, co-training

Related items

1	Research On Automatic Construction Of Speech Corpus And Speech Minimized Labeling
2	Study On Automatic Construction Of Speech Database~2
3	Study Of FM Radio Security System Based On Speech Recognition Technology
4	Research Of Environmental Audio Multi-label Classification Method Based On Deep Learning
5	Research On Automatic Speech-Text Alignment For Mongolian Long Audio
6	Research Of Audio Classification And Segmentation
7	Key Technology Research On Audio Information Hiding And Information Security Application For Speech Recognition
8	Research On Video Semantic Content Analysis
9	Sentence-level Emotion Classification
10	Researching Audio Segmentation And Classification Methods Of The Broadcasting Repertoire