Font Size: a A A

Intelligent Search Engine Model Based On The Multi Body

Posted on:2014-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2268330398995363Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, the amount of network data showed exponential growth, and the users’ needs become more and more urgent.How to accurately and quickly find the results desired by the user in the vast amounts of web data has increasingly become a hot issue in the field of search engine. The traditional search engine based on keyword matching has apparently not able to meet the demand for the majority of Internet users. With the development of Semantic Web technologies, ontology technology runs into people’s vision, and the ontology-based search engine technology has attracted more and more attention. After all these years of research, the single domain ontology-based search engine technology has made progress. In the trend of internet towards personalized and socialized, a single Web resource is not limited to one area, is likely to be the interdisciplinary conceptual entity in the same page resource. Therefore, we need multi-domain ontology-based search engine technology to respond to this trend.This article mainly talks about the framework design of multi-domain ontology-based search engine and related technologies. First, with the current search engine development situation, the article discusses the core issues of the search engine technology evaluation criteria, and analysis the search engine eight development trends. Then, the article analyzed the existing semantic annotation tools and methods, compares of the current semantic annotation tools and methods, proposed a semantic annotation tool model based on multi-body support design diagram. Finally, based on the model of the semantic annotation technology and traditional search engine, proposed a search engine based on multi-body framework. Through the inspection of the experimental system, and analyze the results.The framework is divided into the areas of information collection, web resources pretreatment, metadata extraction, metadata indexing, query expansion and query the rearrangement of six modules.The field of information collection module, based on the traditional information acquisition system, proposed a dynamic-determined whether a page belongs to a domain-specific acquisition strategy;Preprocessing module is to achieve Web devoicing features and web pages to re-focus on the page to re-algorithm;Metadata extraction module, proposed a strategy of structured documents based on the XSLT technology of HTML metadata extraction; Metadata index inverted index technology to extract metadata document to be indexed, to lay the foundation for the retrieval operation;Query expansion to achieve the query syntax, semantics, reasoning extended;Query rearrangement, the Lucene open-source framework, comprehensive link evaluation, documentation composite score based on the document domain ontology match three factors, to improve the retrieval precision.After the assay of the test system, the model in recall and precision is better than traditional keyword-based retrieval model.
Keywords/Search Tags:multi-body search engines, semantic annotation, metadata index, Semantic retrieval
PDF Full Text Request
Related items