Font Size: a A A

The Design And Implementation Of Web Information Extraction System

Posted on:2014-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:S PiFull Text:PDF
GTID:2268330425475645Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Internet as a basis net is developing rapidly, Internet-based World Wide Web (Web) have been plays increasingly important role in people’s daily life. Massive information hosted on the World Wide Web is an important source of access to information in people’s daily life. To dig out one methods easy for people to explore a mass of information from the World Wide Web become increasingly important. Web information extraction is an effective solution in many ways. The subject system is mainly used in the field of e-business products, product information and consulting industries and other Web information extraction scenario; in the same time, the system can also help ordinary users to access a large number of Web information what they interested.The content of the study is to design and implement an information extraction system. By using the Web information extraction system, could meet people’s needs of obtaining customized information from World Wide Web; also be achieved to obtain large quantities of information through the system, as the data entry of follow-up Web information processing system, ultimately Web Data products to meet all kinds of people’s need to obtain information on the Web.Studying the definition of Web information extraction and put forward solution ideas, and take related classifications and definitions for accessing information from the Web, in particular the valuable information; definition a data model for Web formatting information. The data model has important significance for the design of Web information extraction algorithm, and the way of Web information data accessed by the Web information extraction algorithm; design and implement a Web information extraction algorithm for different objectives.For example:to meet user-defined page from anywhere on access to information, design a template-based information extraction algorithm; to meet the user from the list-details page to obtain information, design an automatic extraction algorithm based on list; through the design and implementation of the build server Http Service Framework, achieving through the Http Web provide the demand of information extraction; also through the design and implementation of a decimation task engine, achieving the ability of providing a kind of could be customized and hosting the task-level of information extraction.Finally, the author based on the life cycle of software development, from requirements analysis, system design and implementation, and system testing of these aspects, detailing the system design and implementation issues。...
Keywords/Search Tags:Web information mining, Web information extraction, template-basedinformation extraction, list information extraction
PDF Full Text Request
Related items