Font Size: a A A

An Automatic Labeling System For Broadcast News

Posted on:2011-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:K J HeFull Text:PDF
GTID:2178360308961335Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As the development of the computer technology, network technology and communication technology, large amounts of audio and video content have become available over the Internet. Mining these data sources, either to facilitate their search via audio indexing engines has become an interesting area of research, which needs a large vocabulary speech recognition system with improved performance. As a typical multimedia data, broadcast news is mostly researched by speech researchers.Large amounts speech with quality labeling transcripts are indispensable for training acoustic model in automatic speech recognition (ASR), and typically the transcripts are generated by human. However, today's recognition systems are trained on hundreds or even thousands of hours of speech, and the continuous increase in the size of recognition training corpora may make quality labeling of training data prohibitively expensive.With the lucky that audio data and their corresponding transcripts are available on the internet in many cases. In other words, the problem is not to recognize words from the audio but to align the given text with the audio. This paper addresses the problem of aligning long speech recordings to their transcripts.We provide a recursive technique based on speech recognition and dynamic programming. The algorithm can be simply described as:a recursive speech recognition problem with a gradually restricting dictionary and language model extracted from corresponding transcript. The approach is based on the discovery of islands of confidence (called anchor) found via dynamic programming to align the correct transcript with hypothesized words from speech recognition. The transcript segment and the audio segment are partitioned into aligned and unaligned segments according to the anchors. These steps are repeated for any unaligned segments until a termination condition is reached. In this paper we have improved the recursive approach to deal with broadcast news in three primary differences. First, our approach explicitly searches in the unit of sentence which is easier for manual emending. Second, our approach uses the dynamic programming technique of Dynamic Time Warping (DTW) to align the correct transcript with hypothesis text, for the tolerance to a few errors can avoid some errors in transcripts making alignment failed. Third, for the segment that cannot found an anchor, we use the acoustic model re-estimation technique rather than changing the threshold of DP in recursive, it faces the fact that some segment in News Broadcast audio is too poor to recognize.In this paper we build an automatic labeling system for broadcast news, based on the recursive alignment algorithm and speech segmentation, including Voice Activity Detection, discrimination of speech and music, speaker segmentation. It completes the automatic labeling on three aspects, such as content contract, speaker and speech content. The completeness ratio is 89.2% and 98.9% of the sentences are off by less than 1 second from the real tags. The system can save a large amount of human labor consuming.To improve the performance of labeling system, this paper studied the technology of speech segmentation and experimented in broadcast news audio.
Keywords/Search Tags:automatic labeling, speech alignment, dynamic programming, speech segmentation, acoustic re-estimation
PDF Full Text Request
Related items