Automatic Content Labeling System For Broadcast News

Posted on:2012-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:X J Yu

Full Text:PDF

GTID:2248330362968147

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and network technology,storage capacity and transmission speed of data are increasing. Broadcastnews program is one of the most important ways of obtaining information. Inorder to effectively organize and manage the ever-increasing news data,structured content labeling must be introduced. Manual labeling is timeconsuming and subjective. In this paper we design and implement anautomatic content labeling system to add multi-level tag to the data.This paper [is focused]focuses on the broadcast news audio data. Thereare two main parts in this paper. the one is the automatic audio segmentationand classification, which split the audio into small pieces of segment andclassify the segments by audio types and speakers respectively. The other isautomatic story segmentation and summarization, which split the news audiointo stories by semantic information and extract subject from every story fornews contents.In the part of audio segmentation and classification, a pitch-basedmethod is used to detect silence. Bayesian criteria is added to further separatethe different audio types and speakers. In determining type of homogeneoussegments, we use Gaussian Mixture Model to classify audio segments intospeech, music and noise. Affinity Propagation clustering algorithm is appliedto find the speech segments that belongs to the same speaker.Semantic information and acoustic information are both useful in storysegmentation of broadcast news audio. In accordance with the characteristicof speech recognition results, we improve the SeLeCT algorithm in textsegmentation based semantic information. A rule-based multi-informationfusion method is proposed for story segmentation, which increaseperformance of segmentation. For a single story in news, we need to extract summarization for easybrowsing. Both the VSM-based method such as Maximal Marginal Relevanceand Latent Semantic Analysis and the classification-based method usingSupport Vector Machine are applied in summary sentence selection and theexperiment show that SVM classifier get a better result. Some keywords arealso obtained by calculating the importance score.The system in this paper can automatically label the broadcast newsaudio with sentence segmentation boundary information, classificationinformation, speaker information, story boundary and topic information,which provide variety ways of audio data retrieval.

Keywords/Search Tags:

Automatic Labeling, Automatic Segmentation and Classification, Automatic Story Segmentation, Automatic Summarization

PDF Full Text Request

Related items

1	Research On Automatic Summarization System Based On RSS
2	Study On The Theory & Practice Of Automatic Indexing Of WWW Science And Technology Information Resources
3	Research And Implementation Of Web Text Automatic Summarization System Based On HTMM
4	Research On Classification And Automatic Summarization Of Web Information
5	An Automatic Labeling System For Broadcast News
6	Study On Method Of Automatic Segmentation And Recognition Of Blood Cell Images
7	Design Of Audio Publications Automatic Segmentation System
8	Research On Automatic Summarization And The Application In Proposal Management
9	The Study, Based On Themes By Web Document Automatic Summarization
10	The Technology Of Automatic Text Summarization Based On Deep Learning