Font Size: a A A

The Research And Implementation Of Expansion Algorithm Based On Query Log

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y DingFull Text:PDF
GTID:2248330395977482Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, information on Internet presents an explosive growth. Meanwhile, the number of Internet users is also growing continuously, it is an important research subject in Information Retrieval to let users find the information they need in the vast amount of information. On one hand, the average length of the query word in Chinese is shorter than in English, so information that Chinese Search Engine obtains from users is much less; on the other hand, synonyms and polysemes in chinese led to the difference between words in the query and in document. Which result in the errors of query results in majority of the Chinese Search Engine based on keywords. Thus, many Chinese Search Engine can not meet the needs of users nowdays. Based on these problems, query expansion technologies have emerged.This thesis proposes a Local Co-occurrence Query Expansion Algorithm Based on Query Log (LCQEBQL). Firstly, this algorithm uses improved edit distance vector algorithm and user behavior information to obtain the related user documents collection so that this collection will be more relevant; second, adds Name Entity library to avoid a Name Entity word being splited into multiple meaningless words when doing document or query word collection segmentation so that segmentation will be more accurate; furthermore, when filtering user document collection, considers factors from three aspects at the same time (empty links, navigation pages, the similarity between documents and query words collection), which eliminates irrelevant user documents and improves the performance of this algorithm; then uses local co-occurrence analysis method to calculate the similarity between the terms in user document and the terms in related query words collection, while considering the weight of URL link in query log and the position information of HTML document structure,which improves the performance of the algorithm; finally, when recalculating the weight of the expansion words, adds information of related query words collection so that the weight of the expansion words will be more accurate.Experimental section of this thesis uses URL links in Sogou log to extract1000pages from different fields, filters these pages and saves them as an experimental test set, and designs a prototype system for experimental evaluation of LCQEBQL and other algorithms. The experiment shows that LCQEBQL is more effective than other algorithms, and the search resualts is more relevant.
Keywords/Search Tags:Search Engine, Query Expansion, Query Log, Local Co-occurrence
PDF Full Text Request
Related items