The Speaking face detection based on video information, which means through lip-moving to judge who is speaking without audio information. The correlation technique is: the shot division of the video, the face detection and tracking, the lip location as well as the decision of lip-moving. Regarding name labeling, simultaneously needs the text information, and considered the characters of subtitles and the script, need to fuse the text information.About he fusion of the subtitles and script, the paper introduces a dynamic time warping algorithm, the algorithm uses the idea of fusion and is characterized by the word, and achieved good results. For the human face detection and tracking, taking into account the accuracy of detection, efficiency, and code reuse, this paper uses AdaBoost algorithm and MeanShift algorithm in OpenCV vision library, this combination of methods through the verification of the experiment, and achieved good results. And in this paper, we use this method in face sequence extraction.The detection of mouth has always been the content of lip-reading research field, this article will introduce it to the lips extraction in speaker detection process. Consider the methods proposed in the literature, we use lip color to extract the lip region, and do some improve in this paper, then extract more accuracy lip area. After tested and achieved good results.In previous studies, the use of the speaker's lip detection method, less complex than the lip-reading field, mostly just compute the difference between two image on the mouth region, and set a threshold to determine whether the lip is moving. This paper introduces a machine learning method, by extracting a variety of features in lip region, and training classifier to determine whether the lip is moving. The experiments prove the accuracy and robustness of this method.The method used in the literatures to detect speaking face is based on single frame, that is, people judge whether a face in a frame is speaking. But for the situation, in a face sequence just some of image in the sequence is lip-moving and this sequence isn't speaking, this method can not distinguish the difference. Therefore we propose this method which is based on the selected image sequence to determine if the sequence is speaking in a period of time. The proposed method is more realistic, and also achieved good results. |