Font Size: a A A

Research Of Query Expansion And Search Results Clustering For Web Information Retrieval

Posted on:2011-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:D FanFull Text:PDF
GTID:2178330332956553Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the development of Internet technology, the Web information content grows rapidly,How to retrieve the information which needed from the massive Web information has become an issue of concern,therefore the search engine has walked into people's life. Today, the search engine is becoming more and more powerful, grab the information is also increasing. However, users seem more difficult to pick up the information they need that attributed to two main reasons. First, because keywords are to reflect the amount of information is limited. Second, because the current typical search engine returns a list of results after treatment, the content is disorderly and huge. Because of lacks the information that reflection search interior connection, user is very difficult to distinguish the information need rapidly from the results set.For two main issues above, this paper studies two aspects of assistive technology, the user query expansion techniques and automatic search results clustering.Introducing semantic computing technology into the query expansion is an important research direction. Traditional methods exist for the problems, such as the lack of knowledge in the topic, the introduction of irrelevant words and the filter functions are not proper. In this paper we present a semantic relation tree model which combines with topic selection and local feedback method, classified expand query from the perspective of semantic.Improved the word filter function and increased the threshold limit to control noise.In the clustering algorithms, STC is a recognized good clustering algorithm for Web search results. SHOC, Lingo algorithm that combined vector space model (VSD Model) and the suffix tree document model not only considering the words of the location information, but also consider the statistical properties of words, had the good development in the STC foundation. However, the existing cluster algorithm universal existence cluster label readability is not strong, information content insufficiency, discrimination bad and so on questions, and the cluster result cannot reflect the user interest fully.we have presented a improved clustering algorithm-CQIG, which combined advantages of SVM model with suffix tree model, improved clustering label and cluster score calculation method on the basis of Lingo, Produced more meaningful, understandable, discriminating labels. Joined the overlapping clusters merging process, both retained the overlapping clusters superiority, and active control clusters quantity. The final cluster result has the guidance to user's choice. Meanwhile we strengthened the processing effect to Chinese.Finally, a recommended platform for Web search results clustering was established based on carrot~2 framework where implemented CQIG, STC and Lingo algorithms on to prove the accuracy, distinction and readability of CQIG, show the clustering results to the user.
Keywords/Search Tags:Query Expansion, Semantic Relation Tree, Search Results Clustering, Clustering Quality Assessment, Cluster Label
PDF Full Text Request
Related items