Font Size: a A A

Research And Implementation Of Speech Enhancement Algorithm Based On Hybrid Feature Awareness

Posted on:2023-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q HeFull Text:PDF
GTID:2558306845990209Subject:New Generation Electronic Information Technology (including quantum technology, etc.) (Professional Degree)
Abstract/Summary:PDF Full Text Request
The difficulty of single-channel speech enhancement tasks is to deal with multisource noisy interferences and various time-varying human speeches in complex and unknown environments.Traditional algorithms suffer from the priori-model-mismatching problem,that cannot handle the complex and dynamic scenarios of intelligent speech interaction applications.Although machine learning algorithms have achieved significant improvements in dealing with burst noise,the performance is highly dependent on the completeness of training set and the scale of the model,which cannot be widely used in small interaction devices.Therefore,to solve these bottleneck problems,this thesis proposes a general speech enhancement framework based on hybrid feature-aware algorithms,which can efficiently improve the real-time scene noise suppression and personalized target speech enhancement capability of existing algorithms.The design and optimization of the network learning strategy is carried out for harsh situations such as background noise robustness,fast training of unfamiliar scenes and complex multi-source scenes.Software and hardware implementation and performance testing of the speech enhancement system are carried out.Detailed contents are listed as follows:(1)A general speech enhancement framework based on hybrid feature-aware algorithms is proposed,consisting of the background noise-aware module based on multihead attention and the target-aware module based on speech phonemes,with the purpose to solve the performance deterioration for unfamiliar or burst noise.The background noise-aware module includes two parts: multidimensional scene-based noise feature bases extraction and background noise feature prediction based on multi-head attention mechanism.The target-aware module extracts personalized phonetic posteriorgram of the target speech in noisy signals to introduce deep semantic information of human speech expression.These multidimensional features are adaptively fused and embedded into any single-channel speech enhancement algorithm to effectively improve the effectiveness of speech enhancement.It is demonstrated that the speech quality and intelligibility evaluation metrics are improved by 6.61% and 2.10% on average in unseen noise scenes,and the subjective experimental results prove that our framework gives a better listening experience.(2)Network learning strategies for complex,unfamiliar,multi-source harsh application environments are proposed,including: background noise robustness enhancement strategy and parameters optimization based on hybrid adversarial learning,fast training and migration for unseen noise scenes based on pre-training and model finetuning,and curriculum learning training strategy for multi-source noisy environments.Experiments demonstrate that our learning strategies can improve speech quality and intelligibility scores by 8.01% and 2.88% in unseen noisy scenes,and reduce training time and better adapt to a wider range of noise scenarios.(3)The software and hardware implementation,algorithm porting and performance testing of the speech enhancement system based on the Jetson AGX Xavier artificial intelligence development module are conducted.An optimization method of the real-time performance of the system based on Tensor RT is implemented.The experimental results show that the computational speed of the speech enhancement system can be increased by an average of 5.4 times,significantly reducing the processing time of the speech enhancement system.The results of this thesis provide a strong theoretical basis,solution and experimental support for the implementation of realistic small-scale real-time speech enhancement system,which can be widely used in the fields of intelligent human-computer interaction,smart home and smart city,smart surveillance,intelligent transportation,with good academic and economic significance.
Keywords/Search Tags:Speech enhancement, Hybrid feature-aware network, Background noise-aware, Target speech feature-aware, Hybrid adversarial learning, Mel-scaled weighted reconstruction loss function, Curriculum learning strategy
PDF Full Text Request
Related items