Font Size: a A A

Research On2D-Haar Acoustic Feature Ultra-Vector And Large Scale Speaker Recognition

Posted on:2016-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:E M XieFull Text:PDF
GTID:1228330452964745Subject:Communication and Information Engineering
Abstract/Summary:PDF Full Text Request
With the development of the information technology, the demand for informationsecurity is extremely urgent. In the area of identity authentication and sensitive informationmonitoring, the more and more researches and applications have been paid on the biometrictechnologies. Due to simple deployment, low cost of speaker recognition, SpeakerRecognition which is also known as the Voiceprint Identification, has always been animportant branch of biometricstechnology.In recent years, as the research in the area of pattern recognition theories (such astemplate matching methods, probability statistics methods, machine learning classifiermethods, etc.) and feature learning and feature processing techniques (such as featureextraction and feature generation technics based on machine learning and data miningalgorithms) had been paid more attention, the accuracy of speaker recognition system hadbeen gradually improved, speaker recognition applications had been constantly expanded atthe same time.In the research area of speaker recognition, system accuracy will decay since theamount of target speakers is increasing. This paper aims at solving this problem, studyingcorresponding feature extraction methods, and targeting speaker classifier training methodsas well as parallel speaker recognition methods.The main achievements and innovations of this paper are as follows:(1) A2D-Haar acoustic feature ultra-vector generation algorithm was proposed,which utilized acoustic feature diagram to extract and select the acoustic features, andit was benefit to improve the recognition accuracy.For solving the problem of combing the timing information and the cross-dimensionalinformation of the common acoustic feature vector, we proposed the2D-Haar acousticfeature ultra-vector in feature processing stage. The basic principle is to introduce the ideaof joint time-frequency filtering and feature selection, it firstly computed Haar-like patternsfrom frame-based acoustic features according tothe acoustic feature diagram and acousticfeature integrogram; secondly, aiming at different recognition tasks, the proposed method selected the Haar-like patterns based on the machine learning algorithm; finally, theselected Haar-like feature patterns were utilized to construct the2D-Haar acoustic featureultra-vector, and they would be used to train the machine learning classifiers.The2D-Haaracoustic feature ultra-vector’s potential dimension can be higher, and the Haar-like featurepatterns are selected screened for specific recognition tasks, thus the ultra-vector canimprove the expression capacity and the recognition accuracy.Experimental results show that, in the scenes of audio event recognition, speakerrecognition, and speaker gender recognition,2D-Haar acoustic feature ultra-vector gainshigher accuracy than commonly usedframe-based acoustic features, the highest accuracyimprovement of SVM, AdaBoost, C5.0algorithms can achieve4.2%to9.5%.(2) A speedy generation algorithm for2D-Haar acoustic feature ultra-vector wasproposed, which utilized random selection for acoustic feature patterns.In the generation process of2D-Haar acoustic feature ultra-vector,time-consuming problem for Haar-like patterns selection is a crucial problem, a newmethod had been proposed to optimize the extraction procedure. The key proceduresare shown below: in each iteration, we did not extract every Haar-like pattern ofacoustic feature in the whole space of Haar-like patterns. On the contrary, we randomlyselected the fixed number of Haar-like patterns. Because the fixed amount of thepatterns for random selection was much less than the oriental patterns, the featureextraction procedure can save a lot of time, and became more efficient.Experiment results show that, compared with commonly usedframe-based acousticfeatures, the speedy generation algorithm for2D-Haar acoustic feature ultra-vector is2.9-6.8times faster in training phase,4.9-8.9times faster in recognition phase, and gainsthe highest recognition accuracy improvement from4.8%to8.8%.(3) A speaker recognition algorithm was proposed, which utilizes two-iterationtraining procedure to alleviate the impact of large scale of the target speaker onrecognition accuracy.With the increasing of the amount of target speakers, sample density in the featurespace continues to increase, which can result in the decay of the accuracy. This paperproposed a speaker recognition method which utilizes two-iteration training procedure to alleviate the impact of the large scale of the target speaker. In the feature vector generationphase, different Haar-like pattern combinations for different target speakers were selectedto generate2D-Haar acoustic feature ultra-vector which is different from person to personand is used to improve the differences between different feature vectors and reduce thesample density in the feature space by replacing commonly usedframe-based acousticfeatures. In the speaker classifier training phase, the characteristic of AdaBoost. MHalgorithm which can be described as “accuracy usually turns to be better when the numberof weak classifier is greater than the feature’s dimension” was used to train a speakerclassifier whose number of weak classifier is greater than2D-Haar acoustic featuresvector’s dimension.These features can improve the accuracy of the speaker classifier.Experimental results show that, compared with GMM-SVM algorithm, the proposedalgorithm has faster recognition speed, higher accuracy, and has more slowly accuracydecline with the increase of the target speakers’ amount, the average recognition accuracyon different target speaker scales exceedsthe comparative GMM-SVM method by2.5%.(4) A parallel speaker recognition algorithm was proposed, which utilized CPUMulticore Technology to improve the efficiency of the application for large-scalespeaker recognition.CPU Multicore Technology can improve the efficiency of large-scale speakerrecognition, ERF(Extra Random Forests) algorithm is proposed to construct a parallelspeaker recognition algorithm. ERF algorithm is not a throughout iterative operation, thus itcan obtain higher efficiency through program parallelization. Our experimentsutilizedoperating system scripts for16-core program parallelization, the results show that parallelERF method is5.53times faster than single core in training phase, which is2.3-fold ofparallel GMM-SVM method,2.2-fold of parallel Turbo-Boost method; parallel ERFmethod is8.33times faster than single core in recognition phase, which is1.9-fold ofparallel GMM-SVM method,1.3-fold of parallel Turbo-Boost method. In addition, intraining phase, the ERF algorithmselected the acoustic features vectorfor the negativesamples by random sampling with replacement before the training stage. Thus it keepedthedata-balance of single decision tree, decreased the sampling dataloss from non-targetspeaker samples. Experimental results show that the proposed algorithm alleviates accuracy declinethanit in GMM-SVM algorithm with the increase of the target speakers’ amount, the averagerecognition accuracy on different target speaker scales exceedsthe comparative GMM-SVMmethod by2.7%.
Keywords/Search Tags:speaker recognition, 2D-Haar acoustic featureultra-vector, Turbo-Boost, ERF, random forests, AdaBoost
PDF Full Text Request
Related items