Font Size: a A A

Studies On Query Expansion Based On Item-All-Weighted Association Rules Mining

Posted on:2008-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:M X HuangFull Text:PDF
GTID:2178360215983338Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information overloading has become one of the problems we have to face in recent years with explosive information. As a result, it becomes a hot research area in Information Retrieval how to efficiently and accurately find the information wanted. However, most information retrieval systems (e.g. web search engines) largely depend on Boolean query technique and keyword-based mechanism of symbol matching, which causes some difficult problems in querying information, such as information maze, overloading, and mismatching. These problems thus make an information retrieval system a poor recall and precision. In order to solve these problems, query expansion emerged and grew up quickly, attracting a wide concern and a deep study among experts and scholars in the world. Many query expansion models were proposed by different scholars from various angles. However, the problem of recall and precision is still not yet been solved with satisfaction, not even to say to eliminate the problem of semantic deviation of the retrieval results from user's query intention, and the problem of ambiguity among query terms. After analyzing the limitations of traditional algorithms for query expansion and studying the technique of association rules mining, a new query expansion algorithm is proposed based on item-all-weighted association rules mining. Some significant research results have been obtained. Themes of this dissertation are the algorithm of mining item-all-weighted association rules and its applications to query expansion. Three aspects are researched, i.e. the discovery of item-all-weighted association rules, the query expansion of local feedback based on item-all-weighted association rule mining of terms in top-ranked retrieved documents, and the query expansion of relevance feedback based on both users'clicking and browsing behaviors and item-all-weighted association rule mining of terms in retrieved relevance documents. Contributions of this thesis include:(1) Characteristics and limitations in existing algorithms of association rules mining, weighted association rules mining, item-all-weighted association rules mining and query expansion are deeply studied and analyzed. The state of arts in query expansion is summarized systematically.(2) A new algorithm of item-all-weighted association rule mining is proposed based on the thrice pruning strategy (i.e. MAWAR). A related theorem and its proof are given. The thrice pruning strategy is used in the algorithm, and the number of candidate itemsets and mining time are reduced substantially. The mining efficiency is improved. Experimental results also show effectiveness of the algorithm. Compared to existing algorithms, the mining efficiency of our algorithm is really improved.(3) A novel query expansion algorithm of local feedback is proposed based on item-all-weighted association rule mining, which combines the association rules mining technique with the query expansion. By using the MAWAR strategy, our algorithm can automatically mine those all-weighted association rules related to original query in the top-ranked retrieved documents, to construct an association rules-based database, and extract expansion terms related to original query from the database for query expansion. Experimental results show that our method is better than traditional ones in average precision.(4) From the query expansion algorithm of local feedback based on item-all-weighted association rule mining, a mining method is given based on the quadruple pruning. This method can tremendously enhance the mining efficiency. Experimental result demonstrates that its mining time was reduced by 87.84%, compared to that of the original one. At the same time, a new computing method for weights of expansion terms is given. It makes the weighted value of an expansion term more reasonable.(5) Impacts of the support, the confidence, and the number of expansion terms on query expansion retrieval performance are studied through experiments. Experimental results show that the query expansion retrieval performance is affected by multiple factors comprehensively, instead of one certain factor simply.(6) In order to better apply association rule mining technique to query expansion and find out some better query expansion models, 4 categories of query expansion models with 13 varieties are given based on item-all-weighted association rule mining. Comparison of retrieval performances are made through experiments. Some better query expansion models are discovered.(7) A new query expansion algorithm of relevance feedback is proposed based on users'clicking and browsing behaviors, as well as the technique of all-weighted association rule mining of terms in retrieved relevance documents. The corresponding query expansion retrieval system is designed. According to the duration of user's clicking and browsing, or the existence of some querying behavior such as downloading, this system can determine whether a document is related to users'query intention and interests, only if the user's query habit is kept the same. The algorithm can automatically extract those item-all-weighted association rules related to original query from retrieved relevance documents to construct an association rules-based database, and collect terms related original query as expansion terms from the database. Experimental results show the effectiveness of our algorithm. Namely, the retrieval performance of our algorithm is improved remarkably, comparing to existing ones.(8) A query expansion prototype system is designed and implemented. The principles of designing a query expansion prototype system and its data structure, building the main modules and writing specific codes are studied. The related experiments are carried out by the use of query expansion prototype system. Performances of the algorithms proposed in this thesis are assessed. A test of significance is conducted.
Keywords/Search Tags:information retrieval, query expansion, association rules mining, weighted association rules
PDF Full Text Request
Related items