Design And Implementation Of Automatic Speech Recognition System Based On Lip Reading Technology

Posted on:2015-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Liu

Full Text:PDF

GTID:2308330473950266

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the field of automatic speech recognition(ASR), most of the research focused on the acoustic signal. The performance of these systems was hardly to obtain the expected results in the real world, because of the presence of noise. Hence, the use of visual information would play a very important role of improving the performance of the speech recognition systems, especially in noisy environment. This thesis was focus on lip reading by using visual information only. In previous research, there were two main approaches for lip shape extraction. The first was model-based or geometric-based method. Examples of such features were width and height of the lips(and their temporal derivatives) that could be estimated from the images. The second category was pixel-based or appearance-based method. The features were based on intensity values of the raw pixels. The first category was more intuitive, but there was typically a substantial loss of information because of the data reduction involved. There was little loss of information in the second representation, but the high dimensionality of the image space was a computational disadvantage.In this thesis, similar to the model-based method, the width and height of the inner lip were measured to represent different lip shapes. The advantage of this approach was that the features of lip could be easily obtained and saved the computational time. Since the region of inner lip is darker than other lip areas. Make use of this characteristic, a spatial filter was designed to enhance the region of inner lip. Note that the use of filter in this system did not follow a common approach, but the performance was well acceptable. In addition, this enhancement technology might be used in other areas as well. After the image enhancement, a Gaussian filter was used to remove the noise, and then a clear contour of the inner lip can be obtained. Furthermore, four different kernels were used to measure the height and width of the inner lip. Make use of the sets of data, a database can be built to tell the system how the word and data corresponded to each other. After finish building the database, the system was able to recognize single word as well as multiple words which were saying on video files. When a test video file was input to the system, the system will processing the images and then compare the data with the database. Finally, the system was able to indicate the recognition result by calculating the least deviation to the database. Although the system had gained a few achievements, there were somepotential limitations, such as the requirement of working environment and the position of user’s head.

Keywords/Search Tags:

Automatic Speech Recognition(ASR), Lip Reading, Convolution Kernel, Gaussian filter, Database

PDF Full Text Request

Related items

1	Research On Speech Recognition Based On Convolution Neural Network
2	Research And System Design Of Speech Recognition Based On Improved CNN
3	Mixtures of inverse covariances: Covariance modeling for Gaussian mixtures with applications to automatic speech recognition
4	Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks
5	Research Of Kernel Function Of Support Vector Machine And Its Application In Speech Recognition
6	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition
7	Research On Speech Emotion Recognition Based On Kernel Function
8	Knowledge Distillation For Speech-assisted Lip Reading
9	A Design Execution And Recognition Testing Of Multimedia English Database Of Second Language
10	Database Construction And Algorithm Research Of Visual Speech Recognition Based On Deep Learning