Font Size: a A A

Web Table Oriented Information Extraction General Model

Posted on:2008-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2178360212984952Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the developing of the Internet, the information on the World Wide Web become greater, but the style of information in the pages is liberty and freedom. In this case, let the computer to identify or classify this information automatism become a big problem. For these reasons, the information extraction technical develop.The key technical of information extraction is wrapper generating. Wrapper generating is the hotspot of information extraction technical. But this wrapper generating methods are force on some special domain (e.g. stock, job), then develop a specifically information extraction methods to extraction specifically information. These methods are all localization.This paper advances a web table oriented information extraction model, design and implement the definition, training, expression, storage and extraction of the model. This model design many structure and method to definition object and the factor of the object, it develops a general process to implement this model to do information extraction work. This process gets the pages with information form World Wide Web, then analyzes the page structure by vision information, uses the object model to do special information extraction finally. At the end of this paper, defines two object models (mobile and MP3) first, then extraction nearly 10000 tables form web, finally uses the define object models to do the information extraction works. The experiment shows that the model has a good impression of information extraction.This paper contains following contents:1. The design and implement of the currency object definition model for information extraction.2. The main table extraction in the web pages base on vision information.3. The process and functions of table information extraction with the object model.
Keywords/Search Tags:Information Extraction, Model, Web Table
PDF Full Text Request
Related items