Font Size: a A A

The Frontend Of Speech Recognition

Posted on:2007-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2178360182996013Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The process of computer speech recognition is the same as the process of human speech recognition. It is divided into three parts: speech feature extraction, acoustic modle. Among them, the frontend of speech recognition and speech feature extraction is a very important part of speech recognition. The aim of speech feature extraction is turn the signal to parameters, and then extract the feature of them.The frontend of speech recognition consists of preprocessing and feature extraction. The typical function of preprocessing is speech end detection. End detection use the simple time parameters energy or Zero-crossing-rate to delete the silence, resulting in the computing quantity of continuous processing reducing. The function of feature extraction is extracting the parameters of speech signal by varouis vary. The most application at present is Linear Predictive Coefficients Cepstrum (LPCC) according to the LPC and the Mel Cepstrum coefficient (MFCC) according to the Mel coefficient. However, speech recognition based on human auditory system have another special funtion, Human beings are able to recognize speech amazingly well in high levels of background noise, the auditory system adapts to loud signals and filters them out, and masking. On the other hand,the performance of automatic speech recognition (ASR) systems degrades dramatically with increasing noise.This paper pramarily models the process of feature extraction and simultaneous masking, There are two features of the extraction from speech signal:LPCC and MFCC. The extraction have five steps: Pre-Emphasis> Windowing^ Power Spectrunu Mel Spectrunu Mel Ceptrunu Framing,, Finally, there is our need. In order to identify accuracy, we put the feature.^ into Sphinx4, and then anlaysis the result. At the same time, we modle the process of simultaneous masking. There are four stages in each parallel channel of processing: A wideband filter, a compression stage, a narrow-band filter, and an expansion stage. The threshold of audibility for one sound is raised by the presence of another (masking) sound.This paper also comparing the feature extraction and simultaneous masking. We observe that the recall obtained with the companding front end is consistently better than that obtained with MFCCs. The insertion adjusted accuracy of the companding front end remains below that of MFCs;however at very low SNRs, even this number is significantly better than the baseline.In order to impleting the frontend effectively and accurately , we try to implete vaious teconology.
Keywords/Search Tags:ASR, MFCC, feature extraction
PDF Full Text Request
Related items