A Chinese Multimodal Corpus Using Depth Information

Posted on:2019-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Wang

Full Text:PDF

GTID:2428330626952399

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As one of the earliest human-computer interaction modes,speech recognition has made great progress in the past decades.Among them,the audio-visual speech recognition technology has significantly improved the effect of speech recognition in the case of audio pollution.Audio-visual speech recognition technology research must have a standard audio-visual corpus as a data base.However,the domestic study of audio-visual corpus is not enough.Large of Chinese audio-visual corpora has poor vocabulary,audio and video quality problems,the quality of the planar images are highly susceptible to the factors such as illumination,the head rotation of the speaker and occlusion.In this paper,depth information is integrated into the Chinese audio-visual corpus.A multimodal data synchronous acquisition system is developed by using Microsoft's second-generation Kinect multi-sensor,and a small corpus is collected in advance.The multimodal speech recognition experiment conducted on the basis of this corpus proved that the deep information is of great help to speech recognition.This paper designed a corpus automatic se-lection algorithm.Finally,146 sentences were selected to serve as the ultimate language materials,covering 78% toneless syllables,93.3% inter-syllabic bi-phones.This paper collected 69 people' multimodal data containing audio,color video,depth images and 3D information in a professional recording room,the final Chinese modal corpus established takes total 22.4 hours' duration and takes up more than 6 TB disk space.Finally,this paper designed the isolated word recognition and continuous speech recognition benchmark experiments based on the multimodal corpus,and analyzes the contribution and value of the depth data to speech recognition research.

Keywords/Search Tags:

Speech Recognition, Kinect, Depth Image, Corpus, Multimodal, Phoneme Balance

PDF Full Text Request

Related items

1	Research On Speech Phoneme Recognition Based On Deep Learning
2	Research On Statistical Language Model Of Large-Vocobulary Continuous Speech Recognition System
3	Research On Texture-less Objects Recognition And Pose Estimation Based On Kinect V2 Sensor
4	The Research Of3D Face Recognition Technology Based On Depth Image From Kinect
5	Deep Emotion Recognition Based On Speech And Semantics
6	Teaching Gesture Recognition Based On Depth Image Of Kinect
7	The Establishment And Application Of Uyghur Speech Corpus Based On Online
8	Study On Gesture Recognition In Kinect's Depth Image
9	The Research Of Gesture Recognition Based On Depth Image From Kinect
10	Research And Application Of Gesture Recognition Based On Depth Image