Font Size: a A A

Design And Implementation Of Speech Semi-automatic Annotation System

Posted on:2016-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YangFull Text:PDF
GTID:2308330470980063Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, users have higher requirements on the effect of speech synthesis and speech recognition. Laboratory researches have been more and more applied into real life, and a variety of voice system products continue to emerge. The construction of a large speech corpus is an indispensable task of designing an excellent voice system. The quality of speech corpus is determined by the accurate annotation. Therefore the corpus annotation plays a key role in speech research. A large number of manual annotations are not only time-consuming and high cost, but also generate a great number of errors because human ears are not sensitive to the syllable boundaries in the words or sentences. The thesis realizes a speech corpus semi-automatic annotation system that not only can automatically calculate the boundary and pitch envelope but also can manually correct the results of automatic annotation. The main works and originalities of the thesis are as follows:Firstly, the thesis realized an automatic annotation algorithm of the speech units’ boundary. For the recorded speech file without labeling, a hidden Markov Model(HMM)-base force alignment algorithm is employed to achieve the automatic alignment of time boundary. The deterministic simulated annealing expectation maximum(Deterministic Annealing Expectation Maximization, the DAEM) algorithm is introduced into the embedded training step of HMM training to improve the accuracy of force alignment for labelling the boundary of speech units.Secondly, the thesis realized an automatic annotation algorithm for pitch contour of speech signal. The STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of weighted spectrogram) algorithm is used to extract the fundamental frequencies of speech signal after obtaining the time boundaries of each speech units. Then the smoothing algorithm is applied to the extracted fundamental frequencies to obtain the smoothed pitch contour. After that, the peak points of speech signal are determined by calculating the distance of adjacent peaks according to the pitch period. Because the method can get some wrong peak points, the thesis designed a function to manually correct the wrong peak points by clicking mouth to realize semi-automatic labeling system.Thirdly, the thesis designed and implemented a semi-labeling system for speech signal. The system uses a graphical user interface to draw the boundary of each speech units on speech waveform, while convert the pitch contour extracted by STRAIGHT algorithm to labeling peak points on speech waveform. On the basis of above work, the thesis realized the function of manually modifying the boundary of speech units and the peak points to complete a more accurate and visible speech semi-automatic labeling system.Finely, the thesis studied the tone of Lanzhou dialect by experimental phonetics analysis. The semi-automatic annotation system is used to obtain the accuracy syllable boundaries and pitch contours of Lanzhou dialect. Then the experimental phonetic method is employed to the pitch of each tone to verify traditional conclusions on the tone of Lanzhou dialect.
Keywords/Search Tags:DAEM algorithm, STRAIGHT algorithm, Peak point mark, Semi-automatic annotation, Experimental phonetics analysis
PDF Full Text Request
Related items