Algorithm Research For Text Information Retrieval Based On Web

Posted on:2005-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:M J Zhong

Full Text:PDF

GTID:2168360125958543

Subject:Computer application technology

Abstract/Summary:

With the rapid development of Internet technology, the number of documents on the Internet increases exponentially. One of important researches focuses on how to deal with these great capacities of online documents. Text information retrieval is a task that involves finding more relevant documents for a user query in a collection of documents. This thesis mainly studies the algorithms of information retrieval based on Web.Firstly, this thesis briefly introduces the development and technology regarding the information retrieval. Based on this, the content-based algorithm, the link-based algorithm and fusion-based algorithm about the information retrieval are analyzed. Secondly, in order to avoid low recall in content-based retrieval and topic drift phenomena in link-based retrieval, a new algorithm based on hyperlinks and anchors is proposed which combines the content-based with link-based retrieval algorithm. In this algorithm, Hub and Authority values are firstly calculated from the links between the web pages, and the relevant weight of each page is gained by matching link anchor or document content with query, and then ranks the retrieval results. The experiment results show that the new algorithm for IR has much higher precision and recall.In order to improve the precision and reduce the retrieval time, this thesis puts forward an information retrieval algorithm based on classification and key phrase extraction. Compared with traditional vector space model, this algorithm reduces time complexity and improves precision. The experiment results prove that the novel algorithm works well. Then a new criterion named ranking error is contributed to solve the problem that the traditional performance evaluation methodology can't evaluate the ranking results of the retrieved documents efficiently. The experiment results indicate that the proposed algorithm outperforms TF*IDF and interactive retrieval based on classification in ranking error.Combined with the proposed algorithms and techniques, an English domain-based full text information prototype is implemented on the basis of the information retrieval algorithm.

Keywords/Search Tags:

Text Information Retrieval, Vector Space Model, Link, Anchor, Key Phrase Extraction, Recall, Precision

Related items

1	Study Of Text Information Retrieval Algorithms Based On Web
2	Study Of An Information Retrieval Technology Based On Improved Vector Space Model
3	The Research Of Information Retrieval Algorithm In Vector Space
4	Researched Information Retrieval Based On Bayesian Network
5	Automatic Classification Based On The Concept Of The Text
6	The Compare Two Automated Text Categorization Algorithms Based On The Open Telephone Of Mayor
7	The Method Of Fine-Grained Topic Information Extraction And Text Clustering Based On Chinese Phrase
8	Text Information Retrieval Modifier Role In The Study
9	Research And Implementation Of Text Categorization System Based On VSM
10	Design And Implementation Of Based On Vector Space Model Of Local Search Engine