Font Size: a A A

Automatic Detection Of English Speech

Posted on:2014-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2268330401985395Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Generated speech prosody plays a very important role in speech synthesisparameters of synthesis, high quality synthetic speech often rely on large amounts ofcorpus. Therefore, the corpus could quickly, accurate prosodic annotation is veryimportant for speech synthesis. Corpus annotation requires a lot of manpower andresources, and long time, high intensity of manual annotation easily leads to poorconsistency and error prone, which will bring the high cost, the rapid construction ofcorpus put forward higher requirements. The speech synthesis of diverse demand,request speech library can adapt to various hardware and software environment, andbuild in different accents, different mood, and different style of speaking voice source.If it could realize the prosodic annotation to minimize, the corpus construction costswould greatly reduce, thereby reducing the cost of speech synthesis. For this, we useof the general speech corpus for text processing, minimization, and then realizeautomatic annotation labeling using the supervised learning and unsupervised learningfor model training, and the main content of this paper is as follows:1) According to the characteristics of the general corpus, we use the GMM audioclassification method and speaker classification software for the original audioclassification, segmentation, and removing the music and noise to obtain pure speech.Select and extract the acoustic parameters of the speech at the word level, andcombine with pretreated text to obtain a large number of unlabeled feature files. Inorder to realize the supervised and semi-supervised learning training, a certain amountof sentences are labeled by ToBI tagging system manually.2) In order to obtain the relevant characteristics of acoustic-prosody features,acoustic parameters are extracted using Praat software. And then some model trainingmethod in machine learning are used to train on labeled samples, respectively, whichwill be trained and labeled in the maximum entropy, AdaBoost and J48algorithm under the model training, and several prosodic annotation results were compared andanalyzed.3) Semi-supervised learning only needs a very small amount of labeled examplesfor automatic learning from a large number of unlabeled examples. We use thesemi-supervised learning method co-training to construct the prosodic annotationsystem based on pitch accent, and also detail the design and simplified of the trainingmodel; and on this basis these supervised learning methods of training results areperformance comparison. Collaborative training method of Co-training algorithmdoes not need to be labeled a large number of documents. Compared with supervisedlearning, the method improves the utilization efficiency and a large number ofunlabeled files’utilization.This choice of corpora for general corpus, without recording and processing ofspecial, so some extensions need to be made for the original corpus. And the treatmentof acoustic processing and text processing is the most basic; do not need to spend toomuch time and force. One of the key for the automatic labeling system is the selectionand extraction of prosodic features and acoustic parameters, through thecorresponding rules of rhythm and acoustic parameters to enhance the automaticannotation. And the introduction of co-training algorithm greatly reduces the amountof manual annotation, realizing the minimizing annotation.
Keywords/Search Tags:Maximum Entropy, Praat, J48, Semi-supervised learning, Co-training, Active learning
PDF Full Text Request
Related items