Font Size: a A A

Single Channel Speech Enhancement Based On Nonnegative Matrix Factorization

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:J D LiuFull Text:PDF
GTID:2428330602952134Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the process of acquisition and transmission of a segment of audio,the noise from the system error and the acquisition environment will inevitably be mixed into the audio.If a piece of voice audio is mixed with noise content,the voice quality of the audio will be affected,which will lead to the problem that the listener can not get effective information from the voice.In order to avoid a series of problems caused by the mixed noise of voice and audio,researchers in related fields propose to apply signal processing technology to the processing of voice signals to improve the quality of voice.This operation is called voice enhancement.From the point of view of data type,the problem of speech enhancement can be classified into two categories: single channel speech signal processing and multi-channel speech signal processing.Mono-channel speech signal processing refers to the processing of single-channel speech signal collected by a single pickup device.Single-channel voice data has the advantages of lower requirement for equipment and lower storage space,which makes it easy to transmit and store in the network environment.Therefore,it has attracted wide attention in recent years.But at the same time,the single channel voice data also abandoned the location data in the environment when collecting,and abandoned part of the information.Then,in dealing with some problems,the nature of single channel speech itself improves the difficulty of problem solving.In this thesis,a supervised speech enhancement algorithm based on non-negative matrix decomposition is selected as the goal of research and implementation from a variety of existing speech enhancement algorithms.Under certain assumptions,the algorithm can overcome the problem of less information in a single channel.By learning the noise and speech training data,learning the characteristics of noise and processing them pertinently,a pure speech signal with higher speech quality is finally estimated.As the core mathematical tool of speech enhancement algorithm in this thesis,since it was proposed around 2000,the non-negative matrix decomposition algorithm has been improved in many aspects,such as iteration algorithm,objective function and objective constraint.With the advantages of fast iteration speed and strong interpretability,the algorithm has been expanding in the application field.Non-negative matrix decomposition has been applied in many aspects,from data dimensionality reduction to speech separation and noise reduction.With the progress of signal processing technology and related mathematical tools,human beings have gradually deepened their understanding of speech.For example,the properties and formant characteristics of speech in time-frequency domain,the influence of phase of data in timefrequency domain on speech quality and so on are applied to speech enhancement problems.With the development of the above two background technologies,the supervised single channel speech enhancement algorithm based on non-negative matrix decomposition has attracted wide attention due to its strong interpretability,making full use of prior knowledge and cold start.In order to implement a robust speech enhancement system with noise generalization ability,the supervised non-negative matrix factorization enhancement algorithm,which is the core algorithm of the system,is studied in detail.Emphasis is laid on the utilization of full rank property of decomposition results by speech enhancement algorithm.After analysis,a problem that the full rank property of the dictionary may be destroyed is proposed for the steps of merging speech and noise dictionaries.Following is a brief introduction of the main work of this thesis: First,the paper introduces the theoretical basis of the research content.The technical background of speech enhancement algorithm is clarified in this thesis.Subsequently,the full rank property of matrix is analyzed in detail,and the full rank property is explained from the geometric point of view.Based on the above analysis,it is proposed that there may be redundancy in the new dictionary when combining speech dictionary with noise dictionary.The redundancy of dictionary merging is reflected in the geometric viewpoint,which represents the intersection of the corresponding space between two dictionaries.In this thesis,the intersection of dictionaries is called common space,which is called the problem of common space.Data components in public space may be allocated to speech and noise content in any proportion in the final enhancement step.Therefore,this thesis proposes adding full rank property check to the matrix used in the algorithm to restrict this phenomenon and further improve the effect of speech enhancement.Finally,taking the improved algorithm as the main body,a complete design of speech enhancement system is proposed as the research objective of this thesis.After improvement and improvement,the speech enhancement system implemented in this thesis has better performance than the enhancement system based on traditional algorithm.Combined with the experimental verification,the improved algorithm improves the PESQ speech quality score by about 10%,and the system performance basically meets the daily application requirements.
Keywords/Search Tags:Single channel audio, Speech enhancement, Non-negative matrix factorization
PDF Full Text Request
Related items