Font Size: a A A

Study On Information Extraction For Multimedia Program Catalogue Based On Internet

Posted on:2009-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2178360278962658Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Recently with continuous expansion of digital multimedia services demands to catalogue information of digital multimedia programs from the audience is becoming much more. The demand is a great power to this dissertation.With rapid development of Internet a great increase of web data results in generating a great amount of semi-structured data which include much information about multimedia program thus make it possible to acquire multimedia program catalogue information. So the dissertation takes automatic extraction technique for multimedia program catalogue information as our task and goal with a view to Internet.For solving this problem the dissertation proposes a network automatic extraction system for multimedia program catalogue information (NMPIES) based on the studies on the general classes for IE (Information Extraction) and the general implementation of IE system.Web page pretreatment and HTML automatic classify is the precondition of IE for multimedia program catalogue, which is also the important point of this dissertation. It proposes a set of web page pretreatment technique for multimedia program catalogue information which includes HTML-Tree Center Content Decision Method and Feature Selection based on HTML-Tree. With these key methods the expectation goal can be completed very well.For information extraction for multimedia program catalogue, a method of theme-based information extraction is used. The catalogue information for multimedia program can be acquired much wholly through building catalogue information template, theme similarity judgment and mode matching.Finally the NMPIES system is implemented which is proposed by this dissertation based on Java platform. The NMPIES system can complete expectation goal very well with the prove of lots of experiments.
Keywords/Search Tags:Multimedia Program Catalogue Information, HTML Automatic Classify, Information Extraction, Feature Selection
PDF Full Text Request
Related items