Font Size: a A A

Data Extraction Technology Research Based On The Location Of Web Information

Posted on:2012-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:M Y HouFull Text:PDF
GTID:2178330335464337Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, network information has become one of the most important source of knowledge, this can be Web as a huge database, which contains a variety of valuable information, in order to enable users to accurately obtain the desired information, information extraction becomes necessary.The current number of pages of information are stored in the Web site back-end database, they display common features, the main part of page are formed by a number of local information, and the partial information are constituted by a number of data items. These are defined as data-rich Web pages, which are important sources of Web information. This paper mainly researches for data-rich Web pages of data extraction. Firstly the paper analyzes the status of domestic and international research information extraction, and compares the current techniques of the data extraction. It indicates that the current impact of Web information extraction is a key factor in the merits of extraction rules.For the current existing problems, this paper presents a web-based information extraction system of location data. It searches similar pages using URL structure comparison and themes matching, making use of XSLT as extraction rule model, and producing a location method based on XPath representing of row and column through interacting with users. The system automatically generate extraction rules, and ultimately the users can extract clear structure data of interest.Finally choosing typical Web e-commerce site as experimental data source, using data extraction evaluation recall and precision to evaluate the performance of the system. Experimenting results indicate that the approach is practicable, and has good scalability and adaptability.
Keywords/Search Tags:Web information extraction, information location, extraction rule, XPath location path
PDF Full Text Request
Related items