Font Size: a A A

Research On Tibetan Voice Activity Detection Algorithm

Posted on:2022-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:G P LiFull Text:PDF
GTID:2518306482973189Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Voice activity detection is a technology that distinguishes voice and non-voice signals from voice signals doped with background noise.It directly affects the performance of voice processing technologies such as voice recognition and voice enhancement.Therefore,the research of voice activity detection algorithm plays a key role in improving the performance of voice processing technology.Currently,voice activity detection algorithms are mainly based on feature thresholds and model-based matching.Among them,the feature threshold-based activity detection algorithm compares the feature value of the extracted voice signal with the threshold value set before the experiment,so as to realize the judgment of voice and noise.The activity detection algorithm based on model matching first trains the classifier through training data samples,and then uses the trained classifier to determine whether each frame of signal is voice or noise,so as to achieve the purpose of voice activity detection.With the development of neural network technology,the voice activity detection algorithm based on neural network has emerged in many voice activity detection algorithms based on model matching.Tibetan voice activity detection is the basic work of Tibetan speech processing,but compared to languages such as Chinese and English,Tibetan voice activity detection technology is still in the initial stage of development.On the one hand,the Tibetan voice activity detection algorithm is currently still based on the feature threshold based activity detection algorithm,and the application of Tibetan language in this type of algorithm is relatively small;on the other hand,the voice activity detection algorithm based on model matching has not yet been used Applied in Tibetan.Therefore,there is still a lot of room for development of Tibetan voice activity detection technology.Aiming at the development status of Tibetan voice activity detection technology,this paper uses two types of activity detection algorithms based on feature threshold and model matching to study Tibetan voice activity detection technology.At the same time,this paper innovatively proposes a voice activity detection algorithm based on one-dimensional convolutional neural network and applies it to Tibetan.First,this article applies three commonly used activity detection algorithms based on feature thresholds to Tibetan.Through experimental comparison,it is found that the activity detection algorithm based on short-term energy and zero-crossing rate and the activity detection algorithm based on spectral entropy in the activity detection algorithm based on feature threshold have higher accuracy in the environment of high signal-to-noise ratio.As the ratio decreases,their accuracy rates show a sharp decline.Relatively speaking,the accuracy of the Tibetan voice activity detection algorithm based on Mel Frequency Cepstral Coefficient(MFCC)has better performance in different noise and signal-to-noise ratio environments.Secondly,in order to further improve the accuracy and robustness of the activity detection algorithm on the Tibetan corpus in a complex noise environment,this paper proposes a voice activity detection algorithm based on a one-dimensional Convolutional Neural Network(CNN)and applies it to the Tibetan language.The main idea of the algorithm is to keep the local observation,weight sharing and high-level aggregation of the two-dimensional CNN,and to divide the input layer,convolutional layer and pooling layer of the two-dimensional CNN into a two-dimensional CNN.The structure is set in one dimension.This algorithm simplifies the structure of the neural network,and realizes the accurate detection of Tibetan voice activitys in a complex noisy environment.Simulation experiments show that compared with the voice activity detection algorithm based on MFCC and the voice activity detection algorithm based on two-dimensional CNN,the voice activity detection algorithm proposed in this paper is more accurate and robust.
Keywords/Search Tags:Tibetan voice activity detection, Spectral entropy, Mel frequency cepstral coefficient, Convolutional neural network
PDF Full Text Request
Related items