Font Size: a A A

Design And Implementation Of Automatic Speech Recognition System Based On Lip Reading Technology

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2308330473950266Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the field of automatic speech recognition(ASR), most of the research focused on the acoustic signal. The performance of these systems was hardly to obtain the expected results in the real world, because of the presence of noise. Hence, the use of visual information would play a very important role of improving the performance of the speech recognition systems, especially in noisy environment. This thesis was focus on lip reading by using visual information only. In previous research, there were two main approaches for lip shape extraction. The first was model-based or geometric-based method. Examples of such features were width and height of the lips(and their temporal derivatives) that could be estimated from the images. The second category was pixel-based or appearance-based method. The features were based on intensity values of the raw pixels. The first category was more intuitive, but there was typically a substantial loss of information because of the data reduction involved. There was little loss of information in the second representation, but the high dimensionality of the image space was a computational disadvantage.In this thesis, similar to the model-based method, the width and height of the inner lip were measured to represent different lip shapes. The advantage of this approach was that the features of lip could be easily obtained and saved the computational time. Since the region of inner lip is darker than other lip areas. Make use of this characteristic, a spatial filter was designed to enhance the region of inner lip. Note that the use of filter in this system did not follow a common approach, but the performance was well acceptable. In addition, this enhancement technology might be used in other areas as well. After the image enhancement, a Gaussian filter was used to remove the noise, and then a clear contour of the inner lip can be obtained. Furthermore, four different kernels were used to measure the height and width of the inner lip. Make use of the sets of data, a database can be built to tell the system how the word and data corresponded to each other. After finish building the database, the system was able to recognize single word as well as multiple words which were saying on video files. When a test video file was input to the system, the system will processing the images and then compare the data with the database. Finally, the system was able to indicate the recognition result by calculating the least deviation to the database. Although the system had gained a few achievements, there were somepotential limitations, such as the requirement of working environment and the position of user’s head.
Keywords/Search Tags:Automatic Speech Recognition(ASR), Lip Reading, Convolution Kernel, Gaussian filter, Database
PDF Full Text Request
Related items