Font Size: a A A

Mobile Platform Oriented Multi-channel Speech Enhancement And Recognition Technology

Posted on:2021-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Y DaiFull Text:PDF
GTID:2518306503464054Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
At present,the research on speech enhancement and recognition technology has become the main focus of academia,and its application scenarios are becoming more extensive.The mobile platform is an important application scenario of speech enhancement and recognition technology.However,traditional speech enhancement algorithms require accurate microphone array information and far-field signal assumptions,and this information can't guaranteed on mobile platforms,which leads to poor performance of traditional speech enhancement algorithms on mobile platforms.In addition,mobile platforms can't apply algorithm with high spatial complexity,while traditional speech enhancement and recognition algorithms have high complexity.Therefore,traditional speech enhancement and recognition algorithms cannot be directly applied to mobile platforms.This paper studies the two major issues of multi-channel speech enhancement and speech recognition in mobile scenarios.Firstly,this paper summarizes the current theoretical research and application results of speech enhancement,especially the beamforming algorithms based on timefrequency masks.The mask-based beamforming algorithm can obtain better multichannel signal enhancement effects,but cannot performance well on mobile platforms.This paper proposes a improved time-frequency mask,which can make better use of the characteristics of mobile device microphones.In addition,the main time loss of the algorithm is in the neural network decoding part.This paper effectively reduces the computation and space complexity by compressing the number of nodes in the hidden layer of the neural network while maintaining the effect of the algorithm.Besides,this paper proposes a scene-associated acoustic model,which associates the acoustic model with the scene and filters out parts that are not related to the scene,which can reduce the complexity of the speech recognition algorithm and effectively solve the problem of similar path selection errors when the recognition algorithm is decoded.Improve the accuracy of decoding without relying on natural language processing.In the CHiME4 dataset,the improved time-frequency mask beamforming algorithm has a pesq value improvement of 0.16 compared to the classic delay-sum algorithm and a 0.06 improvement over the cluster-based algorithm.In recognition experiments,combined with the channel correlation algorithm,the speech recognition algorithm reduced the relative value of WER by 48.3% compared to the baseline algorithm;and after the recognition algorithm used the scene correlation acoustic model,the relative value of WER was further reduced by 12.6%.
Keywords/Search Tags:Beamforming, speech enhancement, time-frequency mask, acoustic model
PDF Full Text Request
Related items