Font Size: a A A

Research On Subtopic Mining For Diversified Information Retrieval

Posted on:2015-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:C GuoFull Text:PDF
GTID:2298330467968641Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of the information age makes the Internet data show a "blowout"growth. As one of the most successful applications of Information Retrieval on the Internet,search engines have become an indispensable tool for users to find information needs.However, due to the efficiency, and concurrency of system performance, the search method ofcurrent search engine is still keyword-based. Furthermore, the same query submitted bydifferent users may have different query intents. In order to solve the problems of ideographicvague or ambiguity in the user’s query, explicitly grasp user’s query intent, and meet thediversified needs of different users, this thesis analyzes the user’s query the user’s query firstly,then proposes latent diversified subtopic mining methods, and fully considers the impact ofthe relevant documents and query logs on subtopic diversity.Firstly, the diversified subtopic mining method is analyzed. This thesis proposesLCS-based frequent sequence mining algorithm to extract candidate subtopics in the relevantdocument fragments. Meanwhile, HowNet and query logs are used to cluster and sortsubtopic list. Furthermore, the experimental results show that the method is effective inmining subtopic and clustering query intent.Secondly, this thesis proposes an unsupervised subtopic mining approach. First of all, themethod uses ATF×PDF model to extract candidate topic words in the relevant documentfragment set; then, for ensuring the diversity of the subtopics, this method clusters thecandidate topic words to get the latent topics based on the HowNet semantic similaritymethod; finally, the method employs an algorithm named subtopic combination and sortingbased on LCS algorithm to generate diversified subtopics. The experimental results show thatthe average I-rec@10, D-nDCG@10, and D#nDCG@10reach0.5745,0.5714and0.573respectively, which indicates that this method has a good effect on explicitly clearing thequery meaning.Finally, a subtopic mining system for diversified information retrieval is designed andimplemented. The user is allowed to input and retrieve query subtopics, and the system will mine query subtopics and return diversity search results.
Keywords/Search Tags:Information Retrieval, Intent, Diversity, Subtopic Mining, Latent Topic
PDF Full Text Request
Related items