Font Size: a A A

Non-negative Matrix Factorization Algorithm And Its Application In Voice Conversion

Posted on:2017-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q M ZhangFull Text:PDF
GTID:2308330485464132Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
By decomposing a non-negative matrix into the product of a non-negative coefficient matrix and a non-negative basis matrix, the NMF (Non-negative Matrix Factorization, NMF) represents a data as a non-negative linear combination of non-negative components to capture the subspace of a data or obtain dimensionality reduction of data. Comparing with the PCA, the non-negative representation is meaning physically. As an effective data processing technology, NMF has been widely used in such applications as speech recognition, voice conversion, face detection and recognition, text analysis and clustering, network security, digital watermarking, image processing, biomedical engineering etc.This thesis focuses sparse convolutive non-negative matrix factorization algorithm and its application in the voice conversion. The major works are detailed as follows:(1) A convolutive non-negative matrix factorization algorithm based on Itakura-Saito distance and sparse constraint is proposed. Different from the existing NMF which based on Euclidean distance and K-L divergence, the proposed algorithm adopts the Itakura-Saito distance as the objective cost function to measure the error between the original matrix and its reconstruction version. Itakura-Saito distance function has the property of scale invariant, leading to the smaller elements in the matrix has a smaller reconstruction error. The multiplicative update rules based on the objective cost function with sparse constraint on the coefficient matrix is derived. The experimental results show that the reconstructed speech has higher intelligibility by using the proposed NMF algorithm.(2) We apply the proposed convolutive NMF algorithm to converse voice. Convolutive NMF can characterize the delay information of data, which is more suitable for processing of speech signals. To this end, we adopt the convolutive NMF to converse voice. During the training phase, the aforementioned convolutive NMF algorithm based on the Itakura-Saito distance and sparse constrained is used to obtain the source and target speaker’s time-frequency bases respectively. During the conversion phase, the time-frequency spectrum matrix of the source speech is decomposed on the source basis matrix to get the source coefficient matrix, the target speech is reconstructed by the source non-negative coefficient matrix and the target basis matrix. Experimental results show that the source speech is transformed effectively to the target speech with higher intelligibility.
Keywords/Search Tags:Convolutive non-negative matrix factorization, Sparse contraints, Itakura-Saito distance, Voice conversion, Speech intelligibility
PDF Full Text Request
Related items