Research And Implementation Of Search Engine Prototype Based On Deep Web Crawler

Posted on:2011-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:K Tan

Full Text:PDF

GTID:2178330332985827

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of Internet, the simple, extensible, platform-independent web technology gradually prevailed. Dynamic web pages are gradually replacing static pages. Based on some characteristics of dynamic web pages, it also become an inevitable trend that the deep web comes out. Search engine in a certain extent accesses web information collection for users, which provides an effective way. Traditional search engine mainly presents unstructured data to users, but as a structured data, the data of the deep web apparently can't be achieved through the traditional search engine. With the deep web study for the deepening of the research, getting the deep web data through search engines has become a new task in the field of the deep web.This paper mainly constructs a search engine prototype, which is based on the deep web crawler and the core framework of Lucene. Throughout the prototype system realization process, we mainly discuss and research the deep web search, the deep web querying interface judgement, the deep web surface preprocession and the inputs for associated form querying templates. In this paper, querying interface judgement is based on the principle of the DOM tree.In the deep web surface preprocession, this paper puts forward a algorithm of selecting association form inquires templates. This algorithm is mainly based on modeling the form input values and analyzes the process of the form page query. Through the weighted technique, inputs which are used to fill the query forms templates can be selected. Finally the corresponding backend database query link obtained, and the deep web data also be obtained.In the search engine architecture, this paper mainly uses the Lucene open-source search engine framework which offers two core classes, namely the core index class and the core search class. The crawler will have climbed to get the data content and then save the data into the index of the repository Lucene system. Through the core search class which provides the search query interface to users, search engine prototype architecture based on depth web crawler just can be realized.

Keywords/Search Tags:

search engine, deep web, form, dom tree, Lucene

PDF Full Text Request

Related items

1	Research And Implementation Of Tree Based Search Engine
2	The Implementation Of Web Search Engine Based On Lucene
3	Research And Implementation Of Enterprise Search Engine System Based On Lucene
4	The Research And Implementation Of Full-Text Search Engine Based On Lucene
5	The Research And Implementation Of Enterprise Search Engine Based On Lucene
6	Research And Improvement Of Lucene-based Search Engine
7	Studies And Examples Of Search Engine Based On Lucene And Heririx Build
8	Research And Application Of Intranet Search Engine Technology Based On Lucene
9	The Research And Implementation On Lucene-Based Topic Search Engine
10	Design And Implement Of Information Document Search Engine System Based On JavaEE Platform And Lucene