Research On Audio-visual Speech Separation

Posted on:2021-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:C D Li

Full Text:PDF

GTID:2518306503991059

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Humans have the ability to trace and distinguish the speech of any target speaker in a complex environment where multiple speakers speak simultaneously.The problem of establishing an auditory model to make intelligent machines have similar capabilities known as cocktail party problem.Speech separation is one of the important technologies to solve the cocktail party problem.In recent years,with the development of deep learning technology,speech separation technology combined with deep learning has been developed and has made significant progress.However,most studies only use audio information in real scenes,and other modal information has not been effectively used.From the perspective of multi-modal fusion,the research in this paper explores the method of incorporating the visual information in the real scene into the speech separation system to improve the system performance.Firstly,we have designed an audio-visual speech separation system,which extracts the visual information of the target speakers and incorporates it into the speech separation tasks.Secondly,we have explored different ways of incorporating visual information,we have also developed an attention based mechanism for better utilizing the visual information.Furthermore,we have designed an approach to directly extract the speaker contextual information from the mixed audio and target speakers’ visual information.By integrating the contextual information of target speaker into the speech separation system,further performance improvement has been achieved.In this paper,related experiments are carried out on the LRS2 and Vox Celeb2 audio-visual datasets,and the proposed methods are systematically verified.The experimental results show that,compared with the baseline system,the proposed methods have shown significant and consistent performance improvement.

Keywords/Search Tags:

speech separation, audio-visual, multi-modal, cocktail party problem

PDF Full Text Request

Related items

1	Research On Multi-modal Speech Separation Based On Audio-visual Combination
2	Research On Audio Visual Fusion Speech Separation Method For Multi-person Dialogue Robot
3	Research On Speech Separation Based On Visual Assistance
4	Monaural Multi-speaker Speech Separation And Recognition
5	Audio-Visual Multi-Modal Fusion Approach Research And Application
6	Research On Multimodal Speech Separation Based On Face Video And Audio
7	Research On Speech Separation Algorithm Based On Deep Learning
8	Research On Real-time Extraction Of Target Person’s Speech In Multi-person Speech Scene Based On Single-channel
9	Research On Audio-visual Cross-modal Sound Source Separation
10	Research And Implementation Of Multi-speaker Speech Separation Based On Deep Learning