Research Of Query Expansion And Search Results Clustering For Web Information Retrieval

Posted on:2011-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:D Fan

Full Text:PDF

GTID:2178330332956553

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the development of Internet technology, the Web information content grows rapidly,How to retrieve the information which needed from the massive Web information has become an issue of concern,therefore the search engine has walked into people's life. Today, the search engine is becoming more and more powerful, grab the information is also increasing. However, users seem more difficult to pick up the information they need that attributed to two main reasons. First, because keywords are to reflect the amount of information is limited. Second, because the current typical search engine returns a list of results after treatment, the content is disorderly and huge. Because of lacks the information that reflection search interior connection, user is very difficult to distinguish the information need rapidly from the results set.For two main issues above, this paper studies two aspects of assistive technology, the user query expansion techniques and automatic search results clustering.Introducing semantic computing technology into the query expansion is an important research direction. Traditional methods exist for the problems, such as the lack of knowledge in the topic, the introduction of irrelevant words and the filter functions are not proper. In this paper we present a semantic relation tree model which combines with topic selection and local feedback method, classified expand query from the perspective of semantic.Improved the word filter function and increased the threshold limit to control noise.In the clustering algorithms, STC is a recognized good clustering algorithm for Web search results. SHOC, Lingo algorithm that combined vector space model (VSD Model) and the suffix tree document model not only considering the words of the location information, but also consider the statistical properties of words, had the good development in the STC foundation. However, the existing cluster algorithm universal existence cluster label readability is not strong, information content insufficiency, discrimination bad and so on questions, and the cluster result cannot reflect the user interest fully.we have presented a improved clustering algorithm-CQIG, which combined advantages of SVM model with suffix tree model, improved clustering label and cluster score calculation method on the basis of Lingo, Produced more meaningful, understandable, discriminating labels. Joined the overlapping clusters merging process, both retained the overlapping clusters superiority, and active control clusters quantity. The final cluster result has the guidance to user's choice. Meanwhile we strengthened the processing effect to Chinese.Finally, a recommended platform for Web search results clustering was established based on carrot~2 framework where implemented CQIG, STC and Lingo algorithms on to prove the accuracy, distinction and readability of CQIG, show the clustering results to the user.

Keywords/Search Tags:

Query Expansion, Semantic Relation Tree, Search Results Clustering, Clustering Quality Assessment, Cluster Label

PDF Full Text Request

Related items

1	Chinese Search Results Clustering Research Based On Improved STC
2	Research On Semantics-Based Search Results Clustering Methods
3	Research On Search Results Clustering And Label Extraction
4	Research Of XML Information Retrieval Based On Pseudo-relevance Feedback
5	Extracting Cluster Tags For Search Results Clustering
6	The Study On Web Search Results' Clustering
7	Research On The XML Pseudo Relevance Feedback Technology Based On Clustering Search Results
8	The Research Of Web Text Clustering Based On Ontology
9	Study On Search Results Clustering Algorithm Based On Multi-Core Technology
10	Query Expansion Based On Web Search Results For Sponsored Search