Font Size: a A A

Research On Selection Of Seed-URLs Based On User-interest Ontology

Posted on:2012-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:B C HanFull Text:PDF
GTID:2178330335453073Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, along with the rapid development of ontology technology, owing to its well conceptual hierarchical structure as well as its support of logical reasoning, it has been widely used in the knowledge representation and information retrieval. As the dramatic development of Web technology, the characteristics as the structure's complexity, data's dynamism and the user topic's generalization have brought a great challenge to the existing search engines. It has become the research focus how to locate users'topic resources from the ocean of information effectively and accurately. The research combines search engines with ontology theory to propose the seed URLs selection method for the entrance of topic Crawler. Furthermore, it illustrates the importance of seed URLs to the search engines from experiments.Firstly, by Formal Concept Analysis (FCA) theory, we put forward the construction method of the user-interest ontology. The features methods are as follows: it merges the concept lattices to generate the optimized concept lattice in an up-bottom for expressing user's interests, then through LMOA algorithm of concept lattice-ontology transformation, change the optimized concept lattice into the user-interest ontology. The purpose of the user-interest ontology is to guide the behavior of the topic crawler and select the relevant Web pages to meet the personalized needs.Secondly, while using Web link structure, we propose the method for seed URLs selection based on user interest. The features of this method :①It combines user interest with HITS algorithm. On the one hand, it utilizes the ontology information to prune the basic set of HITS algorithm to improve the ability of identifying HITS algorithm themes. On the other hand, the authorized pages and hub Web pages are used to descript topic area, update the user-interest ontology, expand the user interest, and represent the user needs accurately;②Combining the Web link structre with user-interest ontology,and through the graph theory,we change the method "finding the core topic area" into "finding the complete bipartite directed graph from the bipartite directed graph", and reduce the difficulty of the algorithm;③We expand the user interest feature vector by user-interest ontology , compute the similarity with the authority pages, and re-filter the search results to get the seed URLs.Finally, the experiments in this research employ VC6.0 program to prove the develop applications using verified. Experiment (1): Ten user's query words are presented to Wikipedia, we build the concept lattice on the returned results, and the user-interest ontology is constructed on the concept lattice. Experiment (2): We implement the seed URLs selection method to provide the entrance for the crawling insects and return the more user-related information. Experiments show that the ontology construction based on concept lattice meger method can express the user interest and knowledge background relatively better and eliminate the semantic ambiguity. Therefore, the user-interest ontology is basis for personalized information retrieval.In addition, we evaluate the seed URLs selection method by experiment. We select the seed URLs to submitt the general crawling insects, crawling insects compared the number of downloading the same page number of the relevant pages. After comparing experimental results of three methods prove that the proposed seed URLs based on user-selected ontology is effective.
Keywords/Search Tags:Seed URLs, User-interest Ontology, Concept Lattice Merger, Complete Biograph, Topic Area
PDF Full Text Request
Related items