Data Extraction Technology Research Based On The Location Of Web Information

Posted on:2012-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Hou

Full Text:PDF

GTID:2178330335464337

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, network information has become one of the most important source of knowledge, this can be Web as a huge database, which contains a variety of valuable information, in order to enable users to accurately obtain the desired information, information extraction becomes necessary.The current number of pages of information are stored in the Web site back-end database, they display common features, the main part of page are formed by a number of local information, and the partial information are constituted by a number of data items. These are defined as data-rich Web pages, which are important sources of Web information. This paper mainly researches for data-rich Web pages of data extraction. Firstly the paper analyzes the status of domestic and international research information extraction, and compares the current techniques of the data extraction. It indicates that the current impact of Web information extraction is a key factor in the merits of extraction rules.For the current existing problems, this paper presents a web-based information extraction system of location data. It searches similar pages using URL structure comparison and themes matching, making use of XSLT as extraction rule model, and producing a location method based on XPath representing of row and column through interacting with users. The system automatically generate extraction rules, and ultimately the users can extract clear structure data of interest.Finally choosing typical Web e-commerce site as experimental data source, using data extraction evaluation recall and precision to evaluate the performance of the system. Experimenting results indicate that the approach is practicable, and has good scalability and adaptability.

Keywords/Search Tags:

Web information extraction, information location, extraction rule, XPath location path

PDF Full Text Request

Related items

1	Semi-structured Web Information Extraction Technology And Its Application
2	Research On Web Informaition Extraction Techniques
3	Design And Implementation Of Web Information Extraction Rules
4	Research And Implementation Of Web Information Extraction Based On XML
5	The Study Of Semi-supervised Web Data Extraction Rule Induction Based On User Interaction
6	Based On The Key Pages Of Information To Improve The Hits Algorithm, And Location Information Extraction Method
7	Research On Compound Word Extraction Based On Location Tag
8	Design And Implementation Of Accurate Web Information Extraction System
9	Design And Implementation Of Location Information Cache System
10	Semi-structured In The Xml-based Web Information Extraction