Single-Channel Speech Separation Algorithm Based On Spatial Decomposition

Posted on:2022-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2518306602994809

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As the front end of speech signal processing,speech separation has important applications in the fields of speech recognition,voiceprint recognition,and keyword spotting.Speech separation is usually called the cocktail party problem.The core is to provide a separation model for the mixed speech obtained in a complex environment such as multiple speakers,and extract the speech of each speaker.Research has shown that there are differences in the spatial distribution of different speakers' speech,so this paper uses a spatial decomposition algorithm to achieve speech separation.1.This article starts from the typical spatial decomposition algorithm NMF,deeply analyzes the space decomposition idea of the NMF algorithm and its application in speech separation.This paper found that based on the traditional NMF speech separation algorithm,the basis in the joint dictionary does not satisfy the linear independence under non-negative conditions.Which makes the weight matrix obtained in the separation process uncertain,resulting in the ineffective separation of mixed speech.Therefore,this paper proposes a speech separation algorithm based on Bayesian principle and NMF,which uses the prior information of the mixed voice to estimate the target speech weight,and uses the weight as the initial value of the weight matrix to solve the defects of traditional NMF separation.Experimental results show that the algorithm in this paper is superior to similar SNMF and CNMF algorithms in speech separation index.2.Combines the idea of NMF spatial decomposition with deep learning,this paper proposes a frequency domain speech separation(ONN-FSCSS)algorithm based on orthogonal networks.The network mainly includes encoding layer,separation layer,and decoding layer.The mixed speech is mapped to a special separable space by using the coding layer,and then a special separation layer with multi-channel,orthogonal parameters of each channel and sparse output is designed to achieve the purpose of spatial decomposition.According to the characteristics of each source obtained by the separation layer,the speech is decoded and reconstructed to complete the speech separation.At the same time,this article uses a special pre-training method and loss function to train the network.The experimental results show that the frequency domain separation algorithm based on orthogonal network has advantages over DPCL,PIT and other similar algorithms in terms of parameters and separation performance.3.Inspired by the frequency domain speech separation algorithm of the orthogonal network,this paper introduces the orthogonal convolution network and the grouped orthogonal convolution design mode,and combines the idea of spatial decomposition with DPRNN to propose a time domain speech separation(ONN-TSCSS)algorithm based on orthogonal network.In terms of model structure,this paper adopts a framework similar to DPRNN.By designing a coding layer that contains multiple convolutional groups and different groups of convolutions that are orthogonal to each other,it not only reduces the redundancy of the extracted features of the fully convolutional coding layer,but also reduces the parameters of the model.In addition,the speech features of different sources in the mixed speech are encoded into mutually orthogonal spaces,and then the dual-path block focusing on the same group of features is used to analyze the inter-frame correlation of the speech,estimate the time domain mask,and complete the speech separation.Experimental results show that the proposed time-domain separation algorithm based on orthogonal network has the advantages of good separation performance,strong generalization and less parameters compared with PF,DPRNN and other time-domain algorithms.Generally speaking,this paper designs different speech separation schemes from the perspective of spatial decomposition,including BNMF algorithm improved for NMF,ONNFSCSS and ONN-TSCSS combined with deep learning.Compared with other similar separation algorithms,the spatial decomposition algorithm has obvious advantages in separation performance or parameter quantity.

Keywords/Search Tags:

Single channel speech separation, Orthogonal neural networks, Spatial decomposition, Non-negative matrix factorization, Deep learning

PDF Full Text Request

Related items

1	Single Channel Speech Separation Based On Nonnegative Matrix Factorization
2	Single Channel Speech Enhancement And Separation
3	Research On Two Methods Of Single Channel Speech Separation
4	Study On Speech Separation Based On Non-negative Matrix Factorization And Deep Clustering
5	The Research Of Key Techniques Of Speech Separation And Speech Recognition
6	Underdetermined Source Separation And Its Application To Speech Processing
7	Research On Single Channel Speech Enhancement Algorithm Based On Supervised Learning
8	Research On Single-channel Speech Separation Technology Based On Deep Learning
9	Research On Speech Enhancement Algorithm Based On Non-Negative Matrix Factorization
10	Research On Parallel Algorithm Of Deep Transductive Non-negative Matrix Factorization For Speech Separation