Font Size: a A A

Research On Agriculture P2P Search Engine Key Technology Based On Simple Ontology

Posted on:2014-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L ZhengFull Text:PDF
GTID:1228330398994908Subject:Agricultural information technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of resources on the Internet, more and more internet users rely on specialized search tools, such as Google, Yahoo, Baidu and other search engines to retrieval the information they want. Normally, the search engine firstly retrieves relevant web pages from the hundreds of sites and stores them in a file server. By analyzing the pages and indexing, search engine uses the generated index to locate all query keywords in the web, and returns the most appropriate content according to some specific function. From the analysis of the logical architecture, Large-scale Web search engines are centralized and the sites are spread all over corner of the world, which has its own indexing and query processing mechanism. Therefore, it is a great challenge in the way to success for the search engine with scalability, coverage, security and professionalism.For the problems of the centralized search engine, this paper proposes a distributed search engine based on P2P network. That is separately proposing the retrieval methods for two different topologies, structured P2P network and unstructured P2P network, and clustering and merging the results of two retrieval mechanisms by latent semantic indexing. This paper applies the several of the key technology to the agricultural search engine system based on a simple ontology.Specifically, the results of the research in this article mainly include the following aspects:(1) This paper is based on "Agricultural Sciences Thesaurus", using the ontology editor protege to construct the simple agricultural ontology, and designing an algorithm to convert the vocabulary to the ontology in large quantities.(2) This paper presents a framework of P2P search engines based on agriculture ontology system and builds a global distributed index directory based on simple agricultural ontology on top of the P2P network, which is used to save the meta information of the node index, establish contact with the Class of agricultural ontology and provide a basis for system.(3) The research of this paper is based on two network topology of P2P, using different methods for structured network and unstructured network to retrieve.For structured P2P network, this paper presents an adaptive index retrieval method, combines a Chord ring and a balanced tree. It statistics the number of the search terms in the tree structure and classify the terms with their importance, while the chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to the parent node. This method can be carried out from any node on the query request, while not always have to start from the root. Therefore, even using a tree structure, it is not worried about bottlenecks in the root and nearby.For unstructured P2P network, it takes an algorithm of merging the vector space model (VSM) and relevance ranking to construct the overlay network, which is used K-iteration preference based on semantic group to retrieval. This method effectively solves the problem of retrieval efficiency and precision, and reduces the cost of system of retrieval.(4) In a distributed search engine, each query request is forwarded to a plurality of nodes, then sorted the results according to the size of the degree of correlation and combined into a results list. Each node with the different network topology has a difference retrieval mechanism. The document returned from each node can not be directly used to compare and merge.To solve this problem, this paper uses take a method of search results clustering, which uses latent semantic indexing on the whole document content, takes Apache Lucene as the indexing engine, uses spring rich client platform for clustering engine test and obtains more satisfactory results.In this paper, the system uses Java JDK1.5to analog the P2P networks, in which each node is indicated with a set of IP address and port number and the simulator uses several parameters to control the different properties of the network. In contrast to the similar methods with the experimental results, the method has obvious advantages in the recall, precision and query latency.
Keywords/Search Tags:P2P network, Distributed search engine, Simple agricultural ontology, Result merging
PDF Full Text Request
Related items