Research On Web Information Extraction Technology Based On

Posted on:2012-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:C Cui

Full Text:PDF

GTID:2208330368476287

Subject:Computer software and theory

Abstract/Summary:

The Web2.0 concept proposed after 2004, the Internet use ways have had changed much. People mainly browsed news in the website through Internet before. And now emerged Web2.0 later is mainly to adopt interactive way, users donâ€™t just browse the web news, still can be in exchange information through the site, dialogue, edit web site, etc. As is the fashion blog, virtual community, encyclopedia is pay attention to user interaction experience. This is the technology development direction in the future Web.Along with the wide application of Web2.0, information extraction faced a new problem: because the asynchronous JavaScript is the construction of AJAX framework website, while traditional Web based information extraction and cannot extract AJAX framework site in information. It appeared the traditional Web information extraction to AJAX framework website powerless and based on the site and then continuously AJAX framework emerges. This means that the traditional Web information extraction technology cannot extract usersâ€™ interested useful information. The question also caused extensive concern of the majority of scholars, based on the research of Web information extraction AJAX Internet technology development and application of the theory has great significance.This paper introduces the development of information extraction, research status, information extraction application, the relevant technology and evaluation indexes, the several traditional information extraction technologies, and how a definite introduction based on AJAX Web information extraction technology and its application. Based on the traditional static web information extraction method, on the basis of further put forward and realized based on dynamic web information extraction. In this paper, the URL page treat extraction, through the pages dealing with analytical after the JavaScript code to page analysis and analytical, and then in the newly constructed by DOM tectonic relied web DOM tree. After a page handling then enter information extraction phase, achieve finally solve the information extraction AJAX framework website and JavaScript asynchronous interaction information extraction, the key technical problems to extract AJAX framework website information purposes.This paper realized to AJAX framework website information extraction purpose, on the basis of traditional information extraction was proposed based on AJAX Web information extraction technology research plan. Analysis on the page, page handling, and rules generation module mutual operating proposes theory and technology method for AJAX framework website; information extraction provides new solutions and simple design for AJAX framework of information extraction systems.

Keywords/Search Tags:

Web information extraction, dynamic Web pages, DOM, dynamic Web processing engine, AJAX

Related items

1	Research On Efficient Web Data Extraction Technology Based On Visual Information
2	Design And Implementation Of Train Diagramâ€™s Data Management System Base On Ajax Technology
3	A dynamic engine for the interpretation and execution of Java server pages
4	Research Of Dynamic Comment Extraction Based On Web
5	Design And Implementation Of A Directional Information Extraction Model For Dynamic Web Pages
6	Research On Algorithm Of Crawling Ajax Dynamic Web Pages Based On User Interface State Changes
7	The Research Of Dynamic Web Pages Information Extraction Algorithm Based On Sequence Alignment
8	Crm Project Dynamic Engine Design And Realization
9	Research On Automatically Detecting The Output Exception Of Dynamic Web Pages Based On Concolic Testing
10	The Study And Implementation On The Key Problems Of Intelligent Search Engine Technology