Font Size: a A A

Full Text Retrieval Research Based On Semantic Analysis Of Tag-LDA Model

Posted on:2016-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:L J HuaFull Text:PDF
GTID:2308330464472625Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the 21st century, we are faced with a world of information technology, digital and network. The text information on the network is growing in a way that’s had never been done before. How to retrieve the information the users need in a massive data has become urgent problem. The traditional information retrieval system is based on the keywords the users submitted to match those documents that the same keywords appear in them. Apparently, retrieval results already obtained by such methods can’t meet the needs of users for two reasons. Firstly, we are not dig out deep text semantic information. Secondly, the keywords the users submitted are often too short, which can’t express the true intention of the users. Based on these two points, this paper presents the improved strategy. We adopt the Tag-LDA topic model to min text semantic and extend the query.This paper presents a text retrieval method based on vector space feature conversion. Even though knowledge-based semantic fingerprint information and semantic information obtained by the Tag-LDA topic model are two different representations of semantic features of the same text, they are incompatible with semantic information. Here we introduce the vector space as a bridge to transfer knowledge-based semantic fingerprint information space into the Tag-LDA model space and prove the rationality of the conversion process with correlation theory. The compatible semantic fingerprint information into the Tag-LDA model generate a new topic model STag-LDA. The Stag-LDA model has certain disambiguation effect for the semantic information of the tag, So it can mine text semantic information more accurately to improve the retrieval efficiency.This paper presents a method on document re-ranking based on query expansion of initial retrieval results. The method needs to extract concepts of target documents by means of Tag-LDA topic model and expand the initial query by providing the tag distributions of the documents retrieved at the first search. Firstly, we employ Tag-LDA topic model to model the initial retrieved documents and get "document-tag" probability distribution matrix. Secondly, we regard the top k documents as relevant to the user query, so all tags of the "document-tag" matrix of top k documents are regarded as an initial tag set. Thirdly, we employ the Wikipedia to generate concept relation graphs to filter out the tags which are not related with query subject. And we can obtain a new tag distribution to represent the query subject. Finally, we calculate the relationship of the query subject and documents based the tag distribution. The method filters out the noise of the relevant documents to improve retrieval efficiency.This paper verifies the two methods on the NTCIR-5 Chinese information retrieval corpus and use TREC evaluation tool to evaluate relevant indicators. Experimental results prove that the proposed two methods can effectively improve the accuracy of retrieval information. Also indirectly Confirm that mining semantic information of the text and Clarifying the query intent is very important.
Keywords/Search Tags:Semantic Miining, Informtion Retrieval, Tag-LDA Model, Concept Relation Graphs, Query Expansion
PDF Full Text Request
Related items