The Frontend Of Speech Recognition

Posted on:2007-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:B Li

Full Text:PDF

GTID:2178360182996013

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The process of computer speech recognition is the same as the process of human speech recognition. It is divided into three parts: speech feature extraction, acoustic modle. Among them, the frontend of speech recognition and speech feature extraction is a very important part of speech recognition. The aim of speech feature extraction is turn the signal to parameters, and then extract the feature of them.The frontend of speech recognition consists of preprocessing and feature extraction. The typical function of preprocessing is speech end detection. End detection use the simple time parameters energy or Zero-crossing-rate to delete the silence, resulting in the computing quantity of continuous processing reducing. The function of feature extraction is extracting the parameters of speech signal by varouis vary. The most application at present is Linear Predictive Coefficients Cepstrum (LPCC) according to the LPC and the Mel Cepstrum coefficient (MFCC) according to the Mel coefficient. However, speech recognition based on human auditory system have another special funtion, Human beings are able to recognize speech amazingly well in high levels of background noise, the auditory system adapts to loud signals and filters them out, and masking. On the other hand,the performance of automatic speech recognition (ASR) systems degrades dramatically with increasing noise.This paper pramarily models the process of feature extraction and simultaneous masking, There are two features of the extraction from speech signal:LPCC and MFCC. The extraction have five steps: Pre-Emphasis> Windowing^ Power Spectrunu Mel Spectrunu Mel Ceptrunu Framing,, Finally, there is our need. In order to identify accuracy, we put the feature.^ into Sphinx4, and then anlaysis the result. At the same time, we modle the process of simultaneous masking. There are four stages in each parallel channel of processing: A wideband filter, a compression stage, a narrow-band filter, and an expansion stage. The threshold of audibility for one sound is raised by the presence of another (masking) sound.This paper also comparing the feature extraction and simultaneous masking. We observe that the recall obtained with the companding front end is consistently better than that obtained with MFCCs. The insertion adjusted accuracy of the companding front end remains below that of MFCs;however at very low SNRs, even this number is significantly better than the baseline.In order to impleting the frontend effectively and accurately , we try to implete vaious teconology.

Keywords/Search Tags:

ASR, MFCC, feature extraction

PDF Full Text Request

Related items

1	MFCC Feature Extraction Research Based On ICA And Its Implementation On DSP
2	The Study Of Speaker Recognition System Based On MFCC
3	Study Of Speaker Recognition System Based On MFCC And GMM
4	Wav File-based Voice Feature Extraction Method To Improve Research
5	Research On Speech And Fractal Feature Extraction Of Underwater Passive Targets
6	Research On Feature Extraction And Classification Of Ship Noise And Whale Sound
7	The Frontend Of Speech Recognition
8	The Research Of Feature Parameters Extraction For Speaker Recognition
9	Design Of A Low-power Speech MFCC Feature Extraction Circuit In Mixed-signal Domain
10	Voiceprint Speaker Multidimensional Feature Extraction Platform Design