Font Size: a A A

Automatic Content Labeling System For Broadcast News

Posted on:2012-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:X J YuFull Text:PDF
GTID:2248330362968147Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology,storage capacity and transmission speed of data are increasing. Broadcastnews program is one of the most important ways of obtaining information. Inorder to effectively organize and manage the ever-increasing news data,structured content labeling must be introduced. Manual labeling is timeconsuming and subjective. In this paper we design and implement anautomatic content labeling system to add multi-level tag to the data.This paper [is focused]focuses on the broadcast news audio data. Thereare two main parts in this paper. the one is the automatic audio segmentationand classification, which split the audio into small pieces of segment andclassify the segments by audio types and speakers respectively. The other isautomatic story segmentation and summarization, which split the news audiointo stories by semantic information and extract subject from every story fornews contents.In the part of audio segmentation and classification, a pitch-basedmethod is used to detect silence. Bayesian criteria is added to further separatethe different audio types and speakers. In determining type of homogeneoussegments, we use Gaussian Mixture Model to classify audio segments intospeech, music and noise. Affinity Propagation clustering algorithm is appliedto find the speech segments that belongs to the same speaker.Semantic information and acoustic information are both useful in storysegmentation of broadcast news audio. In accordance with the characteristicof speech recognition results, we improve the SeLeCT algorithm in textsegmentation based semantic information. A rule-based multi-informationfusion method is proposed for story segmentation, which increaseperformance of segmentation. For a single story in news, we need to extract summarization for easybrowsing. Both the VSM-based method such as Maximal Marginal Relevanceand Latent Semantic Analysis and the classification-based method usingSupport Vector Machine are applied in summary sentence selection and theexperiment show that SVM classifier get a better result. Some keywords arealso obtained by calculating the importance score.The system in this paper can automatically label the broadcast newsaudio with sentence segmentation boundary information, classificationinformation, speaker information, story boundary and topic information,which provide variety ways of audio data retrieval.
Keywords/Search Tags:Automatic Labeling, Automatic Segmentation and Classification, Automatic Story Segmentation, Automatic Summarization
PDF Full Text Request
Related items