Font Size: a A A

Design And Implementation Of Web Information Integration Platform

Posted on:2013-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2248330374985432Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, and enrich of network information resources, the Internet has become a more important way for people to query and access to data. Faced with the huge network resources, the search engine provides an important contribution for people to information retrieval.However, traditional search engines are keyword-based retrieval. There are some limitations, such as the existence of a large number of irrelevant search results, the page may be reserved with same information content. Therefore, it is highly necessary to integrate of Internet Information resources to help people extract the specific information concerned about from the massive network resources, and re-integration the data with a unified show. The main research work of this paper is the integration of WEB resources, and Internet users can quickly and accurately find the information they need in this way.Firstly, we research the related theories and technologies in the Web information integration, including the two ways of information integration, the three component modules and four key technologies. We give a comprehensive overview of the knowledge of each module involved in the design process, including the ontology concept,. Web crawler, information extraction, resource description framework.Secondly, in this paper we designed a Web information integration platform prototype system, which use ontology as a guide. We proposed a series of design scheme,such as focused Web crawler based on ontology and search engine, page filtering algorithms based on the ontology, information extraction rules based on the ontology and the DOM tree path, RDF-based data storage model and Based on B/S front results showing. In this information integration platform, the users can set the areas need to integrate. The systems can retrieve and integrated Internet-related areas of resources, and give a unified structure show way to user. The system does not establish wrappers to different data sources, and scoped on the whole Internet, can integrate a variety of heterogeneous resources in the internet.Finally, we make comprehensive test for WEB information integration platform including testing of the efficiency and the number of the crawler, data extraction rate. The test proved the system can integrate a part of the heterogeneous Internet data source, but there are also some shortcomings.
Keywords/Search Tags:WEB, Heterogeneous resources, information integration, informationextraction, ontology
PDF Full Text Request
Related items