Font Size: a A A

Speech Separation Technology Based On Deep Learning

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:J M GuoFull Text:PDF
GTID:2518306110497914Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computers and the Internet,speech separation technology has been widely used in various research fields such as hearing aids,mobile communications,smart home equipment,and speech signal processing,and has attracted more and more attention from researchers.Due to noise interference in the real environment,how to obtain clean target speech signals efficiently and quickly has been a hot issue in research.The method of speech separation based on deep learning converts the task of speech separation into a machine learning process.Compared with traditional speech separation technology,it has important research significance and broad research prospects.And it can be widely used as hearing aid devices and a front-end module for speech recognition.In order to improve the performance of the speech separation system applications based on deep learning,the algorithm of different modules of the system is studied,and the advantages and disadvantages of the algorithm of the speech separation system based on long short-term memory network are analyzed.Long short-term memory networks can make better use of the timing correlation of speech,but there are problems such as long training time and the need to further improve performance such as voice quality.In this paper,from the perspective of the optimization model,in order to reduce the calculation cost and shorten the training time,according to the characteristics of the three gate structures in the long short-term memory network,a double-gate structured,gated recurrent network unit and a single-gate structured recurrent network unit are used to construct speech Separation system.According to simulation experiments,it is found that the use of the gated recurrent network units can improve the performance of voice separation while shortening the training time.On this basis,in order to obtain better speech separation performance,the following two improvements are proposed:(1)When using the gated recurrent units network structure to improve the voice separation performance,there is a phenomenon that the performance indicators are unevenly improved.From the perspective of the training criteria of the model,it is found that the traditional mean square error cannot be well matched with the speech evaluation index.Therefore,improve the training criterion of the model,the loss function,through the calculation principle of the hit rate minus the false alarm rate index.Propose a custom loss function that uses the weighted harmonic average to calculate the gap between the speech evaluation index and the predicted and true values,to make the target speech better match the speech evaluation index while approaching the ideal output.The experiment proves that the improved loss function can effectively improve the hit rate minus the false alarm rate,the short-term objective intelligibility and perceptual evaluation of speech quality indicators,which obtain better speech separation performance..(2)In order to further improve the performance of speech separation,on the basis of improving the internal structure and training criteria of the model,starting from the overall structure of the model,combining the attention mechanism principle and the masking effect of the human ear,the input and output parts of the model are improved.The self-attention mechanism is applied to the input signal to obtain a sequence of attention weights that can distinguish the dominant frame unit of the target speech.After passing through the gated recurrent network,the attention mechanism is applied to the output part,so that the final result is more focused on the target separation speech.Experiments show that the model structure combined with the attention mechanism proposed in this paper can effectively improve the short-term objective intelligibility of the separated speech and the perceptual evaluation of speech quality,which achieve further suppression of the noise in the results.Finally,it summarizes and analyzes the work done,summarizes the research results and deficiencies of this article,and looks forward to the future.
Keywords/Search Tags:Speech separation, Deep learning, Long Short Term Memory Network, Gated Recurrent Unit, Loss function, Voice evaluation indicators, Binary mask, Attention
PDF Full Text Request
Related items