Font Size: a A A

Query Expansion Based On User Log Clustering

Posted on:2011-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:S F JiaFull Text:PDF
GTID:2178360308982476Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, it has been an important way for users to retrieve the necessary message by search engine. Research pointed out that the average length of Chinese query terms is 1.8 words, which is shorter than English queries,2.35 as Craig Silberstein said. There are 93.15% queries which are no long than 3 within the total query logs. It shows that Chinese search engines get less information from users, it is necessary to process the query term to understand the users' intentions. Query expansion can solve the above problem by adding the associate information to the original query, it can form a longer and more accurate input.In this study, we show a simple and fast query expansion method based on the word physical distance, and the weight vector is calculated by the order of the terms and original query. Two points affect the results: pseudo feedback and automatic term recognition in the Chinese word segmentation.To solve the above problems, we present a novel algorithm query expansion by clustering the real user logs. Because not all of the clicked pages are suitable for query expansion, we de-noised the clicked results by reliability to enhance the performance. After HTML labels removing, the page body contents are clustered and the cluster centers cover various aspects of the original query. The terms used in log queries can provide a better choice of features, from the user's point of view, for summarizing the web pages that were clicked from these queries. Therefore, the associated queries, reverse queries, webpage title and keyword phrases are combined with the cluster centers to attain high-quality expansion terms for new queries.
Keywords/Search Tags:Query expansion, log mining, LSI clustering, Baike terminology extraction, webpage de-noising
PDF Full Text Request
Related items