Research And Implementation Of Information Retrieval For Web Documents Base On OHITS And OLSA

Posted on:2009-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Li

Full Text:PDF

GTID:2178360242466531

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Along with the rapid development and the expansion of all kinds of information, how to effectively and precisely get information from the Internet is becoming more and more important and harder. Based on the previous research, this paper proposed some optimized algorithms according to the characteristics of Web document retrieval. Therefore, the document retrieval results become more precise and match with user requirements more.In order to make the information document retrieval results more precisely, it has to be implemented semantically. Ontology is the formal specification of concepts semantic, and it is also a general categorize system for concepts. So the Ontology can provide context semantic support for concepts. The probability of a concept within a categorize system is defined as the probability of an object from this domain been randomly categorized as the object of this concept. And the probability of a concept is intrinsic. What's more, under the condition that an object been categorized in a concept, the probability of the object been categorized in a sub-concept of the concept is also intrinsic. In this paper, the probability is called transfer probability, which can be given by domain experts or by statistic means. However, it is a little bit hard to calculate the probability of each concept in the Ontology. In order to solve this problem, a probability calculation method and an asymmetric semantic similarity calculation method are proposed.Link analysis and Content analysis are two most important methods for Web document retrieval. The most representative algorithms of these methods are HITS and LSA respectively. In order to solve the topic drift problem of HITS, I proposed a new method to calculate the similarity between web pages by utilizing the similarities between concepts in the Ontology, and used the web page similarity as the link weight between two web pages. Therefore, HITS algorithm has been optimized and OHITS algorithm has been put forward in this paper. As for the semantic information loses and special display information loses of traditional LSA method, this paper rose SIF and LSDIF to optimize LSA and proposed OLSA algorithm.All the experiments had demonstrated the validation of OHITS and OLSA algorithms. In addition, these methods have been implemented in the MIA System. So the design and implementation of MIA System have been fully discussed in this paper. Lastly, a conclusion and future work have been made at the end of this paper.

Keywords/Search Tags:

OLSA, OHITS, Semantic similarity, web document, singular value decomposition

PDF Full Text Request

Related items

1	Discovering semantic relations using singular value decomposition based techniques
2	Research On Image Denoising Algorithm Based On Image Self-similarity And Singular Value Decomposition
3	Research On Semantic Similarity Computation And Applications
4	Subjective And Objective Combination Of Semantic Similarity Algorithm And Its Application
5	Dynamic Document Clustering using singular value decomposition
6	Research Of P2P Document Query Based On Semantic Similarity
7	A New Collaborative Filtering Algorithm Based On Both Local And Global Similarity And Singular Value Decomposition(SVD)
8	Folding-up: A hybrid method for updating the partial singular value decomposition in latent semantic indexing
9	Research On Key Technologies Of Context-Aware Web Search
10	Research On Image Objective Quality Assessment