Bimodal speech recognition technology has become a focus,through adding the lip movement information to the previous speech recognition system so as to improve the recognition rate. This paper studies the fast extraction method of the videos, and uses Constrained Local Model (CLM) to achieve it.Firstly, build the lip shape model. Label the lip of the database to get the coordinates, and do the Procrustes analysis to the coordinates, then do Principle Component Analysis (PCA) on the data after Procrustes analysis to get the shape model. Secondly, build the lip intensity model. Extract the patches around the coordinates as the training data, and use the linear SVM to train the patches to get the intensity model. Finally, use the built model to search the lip features iteratively until it can locate the right features.The experimental results show that, the algorithm can achieve the fast feature extraction well, and the labeling scheme of19features can extract the features faster than the previous labeling scheme. |