Font Size: a A A

The Research Of Deep Web Data Integration In Eat And Play Sites

Posted on:2009-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:J B LiFull Text:PDF
GTID:2178360245996000Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet, a large number of various types of web sites were appeared in a short time. People have to be effective in various web sites ,but spend a great deal of time and effort to find the useful information. Although, the sites, such as GOOGLE, Yahoo, Baidu (called Traditional Search Engine), which provide search services, can also search information from the different sites. But the crawlers of traditional search engines only crawl from a link to another link, so new web must be found from the old web. If a page was not be indexed, then it will never be found. Those webs which can not be indexed are a part content of the Deep Web.In this paper, because of the shortcomings of traditional search engines, the specific task-oriented crawler model of Deep Web which is base on information integration and Traditional search engine technology is proposed. And on the basis of this model, the framework, DWDIS, which is a specific task-oriented user query system is designed. Around this model and the framework, which involves a number of key issues carried out deep research. Its design and implementation is achieved a preliminary discussion. Based on DWDIS framework, the System of Easy to Eat or Play is implemented. The framework of DWDIS were conducted an initial Implementation and Application of certification.In this paper, the research and innovation include:1. Established a specific task-oriented crawler model of Deep Web,which is the theoretical basis of this paper. Based on the model of traditional search engine's crawler and the demand of Deep Web, we have increased the work steps of crawler, the main task of crawler which works in Deep web is to analyse the search form in the web.With the help of Semantic Ontology, it analysises search form and tests to fill search and analysises the return of the results. The task of Traditional search engine's crawler is to crawl from a link to another dlink and to download and index webs. 2. Based on this model, the framework DWDIS which is a specific task-oriented user query system is designed. A wide range of information integration of the modules and technology has been applied in this framework which established domain ontology. With the help of domain ontology, the system runs smoothly.3. In the model of Deep Web's crawler, we do a key part of the realization of work, and discuss the key part of the algorithm. In the model, we don't do the work which is the same as traditional search engine's crawler. The discussion is focused on the realization of different segment including label, form element matching algorithm, form element and attribute ontology mapping and filling the form and the quality standards and measurement methods. Finally, we discuss the difficulties extracting information from the result of pages and the ways to overcome difficulties.4. This paper presents the model and methods which were achieved a key part of the work. Based on the framework of DWDIS, we implement the System of Easy to Eat or Play. Based on eating and playing fields, the paper's work was verified. Through these actual development and validation work, for further research, makes it to use in the area of the extensive application for a certain amount of experimental basis.5. In the the System of Easy to Eat or Play, we propose the concept of activities-map. Its producing and show is implemented. We also implemente the interface of user interaction. Through the revision of the Semantic Ontology, the system can be used in other fields with geographic information .Based on the shortcomings of traditional search engines, this paper addressed the issue of how to effectively search the Deep Web information in the exploration of the study and hope to solve the problem of providing an effective ideas and approaches. This paper's subject is a broader technology in the current application areas of information integration , not only for the information in the field of internet search provided ideas and methods, but also in the area of information integration provided some help. This is not only a research paper exploring the theory that the value of research, and is also of great value and practical significance.
Keywords/Search Tags:Traditional Search Engine, Crawler, Deep Web, Ontology, Matching, Information Integration
PDF Full Text Request
Related items