Font Size: a A A

Research And Implementation Of Information Retrieval For Web Documents Base On OHITS And OLSA

Posted on:2009-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiFull Text:PDF
GTID:2178360242466531Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development and the expansion of all kinds of information, how to effectively and precisely get information from the Internet is becoming more and more important and harder. Based on the previous research, this paper proposed some optimized algorithms according to the characteristics of Web document retrieval. Therefore, the document retrieval results become more precise and match with user requirements more.In order to make the information document retrieval results more precisely, it has to be implemented semantically. Ontology is the formal specification of concepts semantic, and it is also a general categorize system for concepts. So the Ontology can provide context semantic support for concepts. The probability of a concept within a categorize system is defined as the probability of an object from this domain been randomly categorized as the object of this concept. And the probability of a concept is intrinsic. What's more, under the condition that an object been categorized in a concept, the probability of the object been categorized in a sub-concept of the concept is also intrinsic. In this paper, the probability is called transfer probability, which can be given by domain experts or by statistic means. However, it is a little bit hard to calculate the probability of each concept in the Ontology. In order to solve this problem, a probability calculation method and an asymmetric semantic similarity calculation method are proposed.Link analysis and Content analysis are two most important methods for Web document retrieval. The most representative algorithms of these methods are HITS and LSA respectively. In order to solve the topic drift problem of HITS, I proposed a new method to calculate the similarity between web pages by utilizing the similarities between concepts in the Ontology, and used the web page similarity as the link weight between two web pages. Therefore, HITS algorithm has been optimized and OHITS algorithm has been put forward in this paper. As for the semantic information loses and special display information loses of traditional LSA method, this paper rose SIF and LSDIF to optimize LSA and proposed OLSA algorithm.All the experiments had demonstrated the validation of OHITS and OLSA algorithms. In addition, these methods have been implemented in the MIA System. So the design and implementation of MIA System have been fully discussed in this paper. Lastly, a conclusion and future work have been made at the end of this paper.
Keywords/Search Tags:OLSA, OHITS, Semantic similarity, web document, singular value decomposition
PDF Full Text Request
Related items