The Study On Ranking And Similarity Calculation In Information Retrieval

Posted on:2009-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:P Yan

Full Text:PDF

GTID:2178360245995010

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the continuous development of social informatizing course, the information needs of people are increasing. How to access useful information fast and efficiently has become focus of people. The research on information retrieval can help people find interesting information effectively, and help them get useful knowledge.The core issue of information retrieval is the prediction of the relevance of documents, and the ranking of documents according to their relevance. In general, the one on the top is considered the most relevant. Therefore, the calculation of relevance and ranking algorithm has become the main issue of information retrieval. Traditional information retrieval mainly used vector space model, which is also used in Web information retrieval, to calculate the relevance. But compared to ordinary documents, Web pages have lots of unique features, such as URL, HTML tag, anchor text, in degree. Meanwhile, there're hyperlinks between web pages, analyzing the links can improve the ranking of search results. The Deep Web is a special kind of Web resources, whose information is stored in databases, users can visit these databases just through some pages with database forms, but the text content in these pages is less, and the links between the pages are fewer, if we still use relevance method for general Web pages, we will get very poor results.This paper focused primarily on Web and Deep Web information retrieval field, focused on these following aspects:1. We built a full-text retrieval system, based on vector space model. We tested how to use HTML tag, anchor text, in-degree features to improve the calculation of relevance on this system. And we analyzed the URL feature of web pages, developed a re-ranking method of search results. The system performed well in SEWM2007.2. For the feature of links between web pages, a topic oriented page rank algorithm is proposed. The new algorithm takes the following factors into account, i.e. the relativity between the content of a web page and the topic, the classification of the links of web pages based on topics, and the importance of the web pages themselves. Experiments show that for two given topics the new algorithm is better than PageRank algorithm in terms of P@10 and users' acceptance. 3. Two methods of calculating semantic relevance between Deep Web databases are proposed. The 1^st method is based on vector space model, but the semantic distance between two databases are calculated based on both the distances between the content texts of the HTML pages and the distance between database forms embedded in the pages. Hierarchical fuzzy sets are used, and an unification processing for database attributes is proposed, the processing is to let the attribute labels that are closed semantically be replaced with delegates. The 2^nd method is based on theory of ontology and fuzzy sets, the database forms are translated from vectors to concept fuzzy sets and the similarity between databases are calculated by necessity degree of matching between fuzzy sets. Categorizing and clustering algorithm is used respectively to test the new methods. Experiments show that the two new semantic methods perform better than traditional ones.

Keywords/Search Tags:

relevance calculation, link analysis, ranking, semantic similarity, degree of matching

PDF Full Text Request

Related items

1	Lexical-semantic Similarity Calculation And Its Application In The Revision Of ISO 860
2	Course Similarity Calculation Using Efficient Manifold Ranking
3	Research On SaaS Service Relevance Calculation Method Under Dynamic Evolution Environment
4	Research On Construction Method And Technology Of Demand And Supply Matching Platform
5	Research On Search Engine Ranking Algorithm Based On Link Analysis
6	Page Ranking Algorithm Based On Link Similarity Study
7	Research On Key Technologies Of Semantic Retrieval Based On Multimodal Data
8	Research On Feature Selection Method Based On Text Category Relevance Degree And Latent Semantic Analysis
9	Research On Movie Similarity Calculation Using Multi-feature Method
10	Semantic Based Similarity Analysis Of Human Video