Research On Real-Time Audio Decoding And Robust Speaker Recognition System In Network Environment

Posted on:2012-12-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Meng

Full Text:PDF

GTID:2218330362450437

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speaker recognition is the technology that can search the voice of the target speaker from the audio information. It could be of great value in the area of security issues or criminal investigation.The goal of this paper is to construct the system of speaker recognition in network environment.In order to construct the system of speaker recognition in network environment, first of all, we should decode all kinds of audio/video to the uniform uncompressed audio in real time. Based on existing audio real-time decoding system in laboratory, the strengths and weaknesses of different Coprocessor is analyzed on implement of real-time audio decoding, and then the MP3(MPEG Audio Layer 3) decoding process of the highest occupancy in the actual network can be transplanted into the many core chip of TILE64.This can solve the problem of audio real-time decoding system which occupies many CPU(Central Processing Unit) resources and has low decoding speed. The new real-time audio decoding system which contains the TILE64 decoding MP3 function has an average decoding speed of 200Mbps.The power consumption is the same as before and the decoding speed is doubled.Secondly, using the new audio real-time decoding system to collect large quantities of audio data and decode into the standard processing unit which has the sample rate of 8KHz,sampling resolution of 16bits and duration for 10 seconds in mono. And the processing units are classified into speech and nonspeech through VAD(Voice Activity Detection) and SVM(Support Vector Machine) with Gaussian kernel. We found that the collection of speech standard processing units account for about one-seventh of the total. Furthermore, we made a real network corpus used for the test of speaker recognition on the collection. Next, we should set up the general speaker recognition system Based on GMM-UBM(Gaussian Mixture Model-Universal Background Model) which is text-independent. Experiments show that the general speaker recognition system works well on the general experimental corpus. However, on the real network corpus, due to the influence of the large quantity of non-target speakers' voice data, the number of false alarms is far more than the number of correct recognition; it cannot meet the practical requirements.In order to solve the problem that the general speaker recognition system cannot meet the engineering requirements in network,two methods of speaker confirmation are designed and implemented, which are high-level semantic window based and the compare of phones. Experiments indicate that these two methods work well in the advancement of robustness in the general speaker recognition system.Especially, the combination of these two methods has a significant improvement on the robustness of the system. When the false alarm rate is 0.1â€°, the recall is 50% which is 6.25 times more than that of the general speaker recognition system. It basically meet the actual engineering requirements.

Keywords/Search Tags:

network environment, audio decoding, speaker recognition, high-level semantic window, the compare of phones

PDF Full Text Request

Related items

1	End To End Speaker Recognition Under Noisy Environment
2	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment
3	Person Recognition Based On Audio-Visual Information With Multi-Level Fusion Under Smart Room
4	VoIP Voice Auditing Based On Speaker Recognition In Gigabit High-speed Network Environment
5	Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Stream
6	Research On Robustness Of Speaker Recognition In Noisy Environment
7	The Speaker Recognition In Noisy Environment
8	Research On Text Dependent Speaker Recognition For Tibetan Amdo Dialect
9	Multi-speaker Tracking Method Based On Audio-visual Feature Fusion Under Intelligent Environment
10	Research Of Speaker Recognition In Low-SNR Environment