Font Size: a A A

Research On Web Information Extraction Technology Based On

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:C CuiFull Text:PDF
GTID:2208330368476287Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Web2.0 concept proposed after 2004, the Internet use ways have had changed much. People mainly browsed news in the website through Internet before. And now emerged Web2.0 later is mainly to adopt interactive way, users don’t just browse the web news, still can be in exchange information through the site, dialogue, edit web site, etc. As is the fashion blog, virtual community, encyclopedia is pay attention to user interaction experience. This is the technology development direction in the future Web.Along with the wide application of Web2.0, information extraction faced a new problem: because the asynchronous JavaScript is the construction of AJAX framework website, while traditional Web based information extraction and cannot extract AJAX framework site in information. It appeared the traditional Web information extraction to AJAX framework website powerless and based on the site and then continuously AJAX framework emerges. This means that the traditional Web information extraction technology cannot extract users’ interested useful information. The question also caused extensive concern of the majority of scholars, based on the research of Web information extraction AJAX Internet technology development and application of the theory has great significance.This paper introduces the development of information extraction, research status, information extraction application, the relevant technology and evaluation indexes, the several traditional information extraction technologies, and how a definite introduction based on AJAX Web information extraction technology and its application. Based on the traditional static web information extraction method, on the basis of further put forward and realized based on dynamic web information extraction. In this paper, the URL page treat extraction, through the pages dealing with analytical after the JavaScript code to page analysis and analytical, and then in the newly constructed by DOM tectonic relied web DOM tree. After a page handling then enter information extraction phase, achieve finally solve the information extraction AJAX framework website and JavaScript asynchronous interaction information extraction, the key technical problems to extract AJAX framework website information purposes.This paper realized to AJAX framework website information extraction purpose, on the basis of traditional information extraction was proposed based on AJAX Web information extraction technology research plan. Analysis on the page, page handling, and rules generation module mutual operating proposes theory and technology method for AJAX framework website; information extraction provides new solutions and simple design for AJAX framework of information extraction systems.
Keywords/Search Tags:Web information extraction, dynamic Web pages, DOM, dynamic Web processing engine, AJAX
PDF Full Text Request
Related items