Font Size: a A A

Research On Documents Ranking Based On Tag Filter

Posted on:2016-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:D Y WuFull Text:PDF
GTID:2308330464473828Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, information technology changes so quickly and network information is in rapid growth. It becomes more and more difficult to find out desired content from such vast amounts of information knowledge without the help of search engines. So how to optimal search engines’ performance and aid user to identify relevant results more efficiently and accurately is a big challenge in today’s information retrieval field.This paper first introduces the research status and related knowledge in the field of information retrieval, and then draws out the research focus on this paper. In this paper, we improve the retrieval performance by taking consideration of two aspects, i.e. tag filter and query expansion. On one hand, the document semantic information mining is an important method in text information retrieval, fully excavating the document semantic information has a lot of help to the promotion of retrieval performance. Tag-LDA model is an improved model based on LDA model. In order to explore the semantic information of the document better, this model adds a label layer between the document layer and the theme layer. The Tag-LDA layer, however, selects text features and extracts labels by much more concerning about the word frequency information in the process of label filtering, without considering the label distinction between document categories. On the other hand, the existing query extension methods are mostly based on the artificial knowledge base and ignore the words semantic dynamic change, which cannot be reflected in the extended retrieval. Based on the above two shortcomings, this article puts forward two solutions in the third chapter and fourth chapter, respectively.This paper proposes a tag selection based document ranking method. This method first introduces the distribution information of feature words based on mutual information, and then describes the characteristics of distribution uniformity through the feature of the distribution of variance. Besides, it also takes the location of the feature information into account. Lastly, we utilize these information to select labels for these documents and filter out more representative labels from them. The semantic information obtained from Tag-LDA model is used for document ranking. The results show that this model enhances the effectiveness of information retrieval.The article also puts forward a document ranking method based on the dynamic word contribution. Against the deficiency of the existing query expansion method, this method promotes some measures for improvement:i.e. calculating and updating the semantic information of words in semantic knowledge base dynamically; incorporating these semantic information into the query expansion and the process of semantic disambiguation. These measures extend query expansion to the dynamic semantic analysis retrieval, and thus improve the retrieval precision and recall rate.
Keywords/Search Tags:Search engine, Topic model, Mutual information, Query expansion, Word sense disambiguation
PDF Full Text Request
Related items