A Study On Automatic Speech Segmentation Method Based On Human Perception Characteristics

Posted on:2015-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Zhou

Full Text:PDF

GTID:2298330452459875

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of speech processing technique, there have been more and moreapplications with the requirement for highly accurate speech segmentation. Traditionally,specially trained manual segmentation has been considered the most reliable and precisemethod to get the segments. However, it will be time-consuming and labor-intensive, es-pecially when the required size of the speech database is huge. Therefore, an appropriateautomatic method for segmentation is more feasible and practical.Nowadays, some methods for automatic speech segmentation have been proposedbased on Hidden Markov Model (HMM), dividing continuous speech into segments (e.g.phoneme) by the means of Viterbi decoding. However, the automatic phoneme boundariesmake mismatching to ones in human perception, to a large extent, which makes it suspectedfor the accuracy of automatic speech segmentation.This research is to solve the accuracy problem by Spectrum Target Prediction Model(STPM, proposed by Masato Akagi) based on human perception characteristics. The con-ceptual idea is to predict the spectral target in human perception in each short-term interval(50milliseconds) and then choose points-in-time as boundaries when the target changes.From experiments results by STPM, there are candidates for the precise automatic phonemeboundaries. However, the numbers are too many, which makes it difcult to select the suit-able candidates. In this research, a method combining HMM and STPM has been proposed.Firstly, relatively rough phoneme boundaries are obtained by HMM. Meanwhile, a errorslist fle is obtained using training sets, recording the former phonemes and later phonemesof all boundaries appearing in training sets as well as the average, maximum, minimum er-rors compared to more precise manual labelling. Then, taking the boundaries by HMM asreference points, more precise automatic boundaries are calculated out from candidates bySTPM, using diferent methods according to the diferent average errors in errors list fles.With the widely used objective evaluation standard what percentage of the automat-ically labelled boundaries are within20milliseconds threshold of the manually labelledones, the proposed method has improved90.02%based on HMM to92.07%. Additionally,about other objective evaluation standards used in this research, the data has also increased,to a certain extent. However, some related research topics such as the subjective evaluationstandard should be carried out as the future work. Moreover, there is a gap between exper-iments results and the theoretical upper-limitation evaluation results (100%) with the samestandard, which indicates the necessary optimization for the proposed method.

Keywords/Search Tags:

"[automatic speech segmentation]", "[human perception characteris-tics]", "[spectrum target prediction model]", "[errors list fle]"

PDF Full Text Request

Related items

1	Research On Ionospheric Clutter In-terference Characteristic In HFSWR
2	Effect Of Phase-locking Response In Auditory Midbrain On Speech Perception And Automatic Initial/final Segmentation In Continuous Mandarin Speech
3	Research On Robust Automatic Segmentation Of Dialectal Speech
4	Research On Deep Learning Based Speech Dereverberation Method
5	Research On Automatic Target Extraction In SAR Imagery
6	Research On The Technology Of Automatic Segmentation For Text-To-Speech System
7	Research On Spectrum Sensing And Prediction Based On Hidden Markov Model In The Cognitive Radio
8	Research On Automatic Segmentation Technology And Automatic Segmentation Of Speech In Dai Language Speech Synthesis System
9	Research And Application Of Target Perception Calculation Model
10	Research On Problems Of Text-To-Speech System