Font Size: a A A

Research On Robust Speech Recognition In Adverse Aoustic Environment

Posted on:2021-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:R K HeFull Text:PDF
GTID:2428330626454083Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Since ancient times,voice as the most commonly used communication in human life,we express the thoughts and demands,pass on civilization through the unique language.Therefore,voice plays an important role in the development of human civilization and social progress.In artificial intelligence technology,speech recognition technology is to transform human language information into text information that machines can understand,and ultimately realize human interaction with machines,so that machines can make correct feedback.It is an important bridge for natural person-machine interaction,which can greatly promote the development of artificial intelligence.However,in real life,whether the machine or human acceptance of the voice signal is indirectly transmitted through the air and other media,therefore,these sound signals are very vulnerable to various noise interference and different environments exist in the echo,reverb and other effects lead to distortion,even in more complex acoustic scenes,but also the target voice completely covered up.This poses a great challenge to the application of our speech recognition system in real-world scenes.In this paper,the front-end and back-end of speech recognition are combined to study it by enhanced algorithms such as speech separation and noise reduction.In addition,the robustness of speech recognition in realistic and complex scenes also needs to consider the multi-variability factors of the speaker itself,such as the speed and slowness of speech.So we also started a robust study of changes in speech velocity.The main work of the paper is as follows:(1)Speech recognition baseline system is built.In view of the robustness of speech recognition in complex acoustic scenes,this paper verifies the validity of the algorithm on CHi ME-5,the international robust speech recognition evaluation task.In this paper,the baseline system based on the traditional beamforming and deep learning speech recognition system is rebuilt first.(2)A robust study on noise reduction and speech separation.Considering the widespread non-stable noise and speaker overlap in the dataset,it is difficult to identify well only with the traditional speech recognition back-end module through a single front-end enhancement,so this paper proposes improvements to address the different effects of noise on speech separation and multi-channel noise reduction algorithms.The speech separation and noise reduction are carried out in different steps,different mask estimation criteria are designed to Blind Source Separation,so that the corresponding parameters of speech separation and the minimum variance distortionless response space filter are updated respectively,The combination of speech enhancement and speech separation module enable the system to solve robust problems caused by noise and speaker interference at the same time.Compared with the design of three different structure recognition systems,the improved algorithm proposed in this paper proves that the WER is reduced by 10.91 percentage points compared with the baseline of speech recognition system.(3)A robust study on fast and slow speech.In addition to the robust problem cause of the external environment,the influence of the change of speaker's speed on the robustness of the system is also one of the hot research topics.In particular,the effect of voice wake-up for speech recognition's sub-tasks is particularly significant.Therefore,this paper first establishes an end-to-end wake-up system instead of the traditional keyword/filler system,and uses word modeling instead of frame-level phoneme modeling to reduce the model's dependence on language information.Then use the RPN in image recognition technology to model the keyword data.Finally,the experimental results show that the proposed algorithm has better robustness in the fast and slow speech scene.
Keywords/Search Tags:Robust Speech Recognition, Blind Source Separation, Deep Learning, End-to-End System
PDF Full Text Request
Related items