Font Size: a A A

A Study On Automatic Speech Segmentation Method Based On Human Perception Characteristics

Posted on:2015-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:K Y ZhouFull Text:PDF
GTID:2298330452459875Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of speech processing technique, there have been more and moreapplications with the requirement for highly accurate speech segmentation. Traditionally,specially trained manual segmentation has been considered the most reliable and precisemethod to get the segments. However, it will be time-consuming and labor-intensive, es-pecially when the required size of the speech database is huge. Therefore, an appropriateautomatic method for segmentation is more feasible and practical.Nowadays, some methods for automatic speech segmentation have been proposedbased on Hidden Markov Model (HMM), dividing continuous speech into segments (e.g.phoneme) by the means of Viterbi decoding. However, the automatic phoneme boundariesmake mismatching to ones in human perception, to a large extent, which makes it suspectedfor the accuracy of automatic speech segmentation.This research is to solve the accuracy problem by Spectrum Target Prediction Model(STPM, proposed by Masato Akagi) based on human perception characteristics. The con-ceptual idea is to predict the spectral target in human perception in each short-term interval(50milliseconds) and then choose points-in-time as boundaries when the target changes.From experiments results by STPM, there are candidates for the precise automatic phonemeboundaries. However, the numbers are too many, which makes it difcult to select the suit-able candidates. In this research, a method combining HMM and STPM has been proposed.Firstly, relatively rough phoneme boundaries are obtained by HMM. Meanwhile, a errorslist fle is obtained using training sets, recording the former phonemes and later phonemesof all boundaries appearing in training sets as well as the average, maximum, minimum er-rors compared to more precise manual labelling. Then, taking the boundaries by HMM asreference points, more precise automatic boundaries are calculated out from candidates bySTPM, using diferent methods according to the diferent average errors in errors list fles.With the widely used objective evaluation standard what percentage of the automat-ically labelled boundaries are within20milliseconds threshold of the manually labelledones, the proposed method has improved90.02%based on HMM to92.07%. Additionally,about other objective evaluation standards used in this research, the data has also increased,to a certain extent. However, some related research topics such as the subjective evaluationstandard should be carried out as the future work. Moreover, there is a gap between exper-iments results and the theoretical upper-limitation evaluation results (100%) with the samestandard, which indicates the necessary optimization for the proposed method.
Keywords/Search Tags:"[automatic speech segmentation]", "[human perception characteris-tics]", "[spectrum target prediction model]", "[errors list fle]"
PDF Full Text Request
Related items