Wikipedia-Based Mining Of Search Results

Posted on:2012-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Zhou

Full Text:PDF

GTID:2178330338484147

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, web search engines are playing an important role in the Internet and used by more and more users because of its rapid and direct way of accessing data. In a typical search engine, search result is the interface to the users, and the qualities of the search results affect the users'experience a lot. In the thesis, we will focus on the search result mining problems, more precisely, the query-biased summarization generation problem and the search result clustering problem.Query-biased summary is query-centered document brief representation. In many scenarios, query-biased summarization can be accomplished by implementing query-customized ranking of sentences within the web page. However, it is a tough work to generate this summary since it is hard to consider similarity between query and sentences for lacking of information and background knowledge behind these short texts. In the thesis, we presented a novel way of query-biased summarization based on the Wikipedia knowledge. The experiments show a significant boost on the quality of the query-biased summarization. And we also found that our method behaved even better in the case that the occurrence of the query terms in the document is relatively low. In addition, experiments have been performed by us to show how the concept vector lengths affect the summarization quality.For some queries, especially ambiguous queries, its search result list may consist of information of multiple topics. And the technique to cluster them by topics is the search result clustering technique. Traditional clustering methods usually use simple text similarity metrics in the clustering process, but these methods can neither provide a meaningful enough clustering result, nor provide a human-readable title to each cluster. In the thesis, we presented a new way of search result clustering using Wikipedia knowledge. Our method first use Wikipedia to construct the mapping between the texts and the concepts inside Wikipedia. The experiment shows our method can reach a good clustering result and we also discovered that non-linear learning techniques such as SVM can further improve the clustering result.The experiments showed that the methods we presented is effective in practice and proved it is meaningful to make use of semantic data from the knowledge base in the traditional information retrieval tasks. In addition, we also discussed the weakness of the current methods and our future work in this field.

Keywords/Search Tags:

Query-biased summarization, Search result clustering, Wikipedia, Explicit Semantic Analysis

PDF Full Text Request

Related items

1	Research On Web Search Result Clustering With Wikipedia Disambiguation Pages
2	A Study On Improving The Methods Of WEB Query Classification
3	Research On Query Processing And Result Caching In Search Engine
4	Query Conceptual Graph Guided Web Search Result Analysis
5	Classification Of Query Intent Using Encyclopedia Knowledge And Statistical Method
6	Study On The Technology And Application Of Biased Summarization
7	A Comparative Study Of Explicit Search Result Diversification Algorithms
8	Semantic Web Search Technology Based On Wikipedia
9	Pre-query Processing For Semantic-oriented Web Search
10	The Study Of Latent Semantic-Based Personalized Search Key Technology