| With the development of voice technology,the demand for intelligent voice devices in daily life is becoming more and more obvious,and voice enhancement technology has also attracted widespread attention from academia and industry Speech enhancement technology,as one of the key technologies in speech recognition tasks,has always been a research hotspot,and has broad application prospects in speech communication,smart home,speech translation and other fields.In the speech enhancement task,there are many factors such as the potential interaction between the target speech features and the correlation of context and timing information,which will affect the performance of speech enhancement to a certain extent.Therefore,how to base on issues such as potential association information between speech features and contextual timing-related information in complex acoustic scenarios is an important breakthrough point for research on speech enhancement tasks.Based on reviewing and organizing a large amount of research literature at home and abroad,this thesis firstly outlines the application of single-channel speech enhancement,the current status of research at home and abroad,and the problems and challenges faced;secondly,it describes the classification,architecture,and deep network model for speech enhancement of single-channel speech enhancement.This thesis proposes a single-channel speech enhancement method based on dual-complex convolutional attention aggregation recurrent network to fully learn the potential correlations among and within speech feature blocks.This thesis proposes a single-channel speech enhancement method based on dualcomplex convolutional attention aggregation temporal optimization network to improve the network model’s adequate representation of speech contextual time series change information in two dimensions: within time period and between cycles.Finally,based on the above two methods,a single-channel speech enhancement prototype system is designed and implemented.The main research aspects of this thesis are as follows:(1)Considering the potential correlation of signals in speech,a single-channel speech enhancement method based on a dual complex convolutional attention aggregation recurrent network is proposed.The method learns the potential inter-and intra-block correlations in speech high-dimensional features more effectively by constructing inter-block attention mechanism and intra-block attention mechanism for speech features,while embedding the constructed attention network into the dual-branch deep complex convolutional network to guide the learning of more speech phase information and enhance the final speech enhancement performance.Experimental results on a relevant speech enhancement dataset verify that the dual-complex convolutional attention aggregation recurrent network model is effective.Compared with the cutting-edge speech enhancement methods,this method can effectively improve the comprehensive performance of single-channel speech enhancement.(2)To address the problem that the contextual time series information of speech has an impact on the performance of single-channel speech enhancement,a single-channel speech enhancement method based on a dual-complex convolutional attention aggregation timing optimization network is proposed,considering the potential correlation relationship between the time cycle and within the cycle in the contextual timing variation of speech signals.To further address the problem of inadequate representation of speech contextual time series information in speech enhancement tasks,a complex time series network is constructed.The network slices speech features in the temporal dimension,treating each time slice within a time period and each time slice between a time period as a time cycle,and improves the representation of speech contextual time-series change information in the speech enhancement task by learning the time period and the cycle period,respectively.Finally,the features within time period and between time periods are adaptively weighted and fused to learn to generate more robust speech context timing change features.The experimental results on a single-channel speech enhancement dataset show that the bipartite complex convolutional attention aggregation temporal optimization network model can further improve the comprehensive performance of the single-channel speech enhancement task and enhance the objective quality and intelligibility of the estimated speech.(3)Using Python language,Pytorch deep learning framework and Py Qt5 for system interface design and implementation of a single-channel speech enhancement prototype system,which integrates four functions of speech data preprocessing,model training,speech enhancement and auditioning and visualization.This system has a beautiful interface,simple operation,and good interactive experience,which can bring good experience to users.In terms of functionality,the single-channel speech enhancement method with integrated feature attention aggregation and contextual timing optimization is implemented,which illustrates the usability of the method proposed in this thesis. |