Font Size: a A A

Wikipedia-Based Mining Of Search Results

Posted on:2012-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhouFull Text:PDF
GTID:2178330338484147Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, web search engines are playing an important role in the Internet and used by more and more users because of its rapid and direct way of accessing data. In a typical search engine, search result is the interface to the users, and the qualities of the search results affect the users'experience a lot. In the thesis, we will focus on the search result mining problems, more precisely, the query-biased summarization generation problem and the search result clustering problem.Query-biased summary is query-centered document brief representation. In many scenarios, query-biased summarization can be accomplished by implementing query-customized ranking of sentences within the web page. However, it is a tough work to generate this summary since it is hard to consider similarity between query and sentences for lacking of information and background knowledge behind these short texts. In the thesis, we presented a novel way of query-biased summarization based on the Wikipedia knowledge. The experiments show a significant boost on the quality of the query-biased summarization. And we also found that our method behaved even better in the case that the occurrence of the query terms in the document is relatively low. In addition, experiments have been performed by us to show how the concept vector lengths affect the summarization quality.For some queries, especially ambiguous queries, its search result list may consist of information of multiple topics. And the technique to cluster them by topics is the search result clustering technique. Traditional clustering methods usually use simple text similarity metrics in the clustering process, but these methods can neither provide a meaningful enough clustering result, nor provide a human-readable title to each cluster. In the thesis, we presented a new way of search result clustering using Wikipedia knowledge. Our method first use Wikipedia to construct the mapping between the texts and the concepts inside Wikipedia. The experiment shows our method can reach a good clustering result and we also discovered that non-linear learning techniques such as SVM can further improve the clustering result.The experiments showed that the methods we presented is effective in practice and proved it is meaningful to make use of semantic data from the knowledge base in the traditional information retrieval tasks. In addition, we also discussed the weakness of the current methods and our future work in this field.
Keywords/Search Tags:Query-biased summarization, Search result clustering, Wikipedia, Explicit Semantic Analysis
PDF Full Text Request
Related items