Font Size: a A A

Research On Documents Ranking Based On Semantic Analysis

Posted on:2015-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChenFull Text:PDF
GTID:2268330428467676Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of society and the continuous progress of science and technology, the information that people can contact is also a geometric growth, these information with continuous accumulate on the formation of our well-known "massive amounts of data". So how to retrieve the information you need in a massive data accurately and quickly become the opportunities and challenges which the internet information age facing about.This thesis analyzes the domestic and foreign research methods of topic models and semantic correlation computing. Found in the current study are mostly from the statistical point of view simply to match the query to documents that ignore mining the semantic knowledge of the query document. So in this thesis, we excavate the potential semantic of each document. Then match the semantic of documents to the query making the retrieved results more comprehensive and accurate.This thesis presents a method on documents ranking based on tag topic model. Through using this method we obtained two important matrix:"tag-topic" probability distribution matrix and "word-topic" probability distribution matrix and get the word semantic contribution to the document. The word on the contribution of the document were quantified analysis rather than get the score through simple word frequency and anti document frequency. And the tag topic model has strict mathematical derivation proof that research information retrieval technology from the tag and topic from the perspective of the correctness of the theoretical and experimental. The method on documents ranking based on tag topic model full account of the semantic of document and the ambiguity of the word.This thesis presents a method on documents ranking based on semantic analysis of concept. The method first calculates semantic relevance of the tag to each document. Then use the tag topic model to model the documents and get "word-topic" matrix. This matrix will be mapped by the semantic relevance of the document to get the word contribution. This approach leverages the semantic relevance between the query and the document tags. Integrate the query and document closely. A mapping makes more related to the query words in the document the weight bigger. While those with less relevant query words will be filtered out, thereby improving the accuracy of document ranking.This thesis verify the two method on the NTCIR-5Chinese information retrieval corpus. And using TREC evaluation tool for evaluation of the experimental results. Experimental results show that method on documents ranking based on tag topic model and method on documents ranking based on semantic analysis of concept presented in this thesis can improve the accuracy of the retrieval ranking better. Also indirectly prove the effectiveness that information retrieval research methods from the semantic point of view this article.
Keywords/Search Tags:Information Retrieval, Tag Topic Model, Concept Semantic, SemanticRelevance, Semantic Contribution
PDF Full Text Request
Related items