Today, the Internet has entered every corner of people’s lives. With the increasing of information on the Internet, the growth rate is also growing fast. In the face of the massive information, how to obtain the information that people need from the Internet has become a hot field of information retrieval. Mainstream search engines’methods are still depended on keyword matching, but the Internet is so huge that it is difficult to give customer satisfaction results, query expansion technologies emerged. For decades of research, query expansion has been developed to some extent, but still not completely solve the query accuracy problem of massive information. In this paper, based on the analysis of the previous algorithm, we proposed a query expansion algorithm combined the idea of crowdsourcing. The experiments show that the new algorithm significantly improved query effect. The main work of the paper is as follows:First, this paper introduces the research background of query expansion, the development of query expansion and gives a brief description of the research and work content of this paper. Then, this paper introduces the theory of information retrieval and query expansion and gives a detailed study of mainstream query expansion algorithms to provide a theoretical basis for this research work. The article also briefly introduced the principle of crowdsourcing and its implementation algorithm---EM algorithm and we also improved the EM algorithm, this is the preparation for the combination of user query logs and crowdsourcing.In this paper, we give a detailed statistical analysis for user query logs, including the analysis of the e users’query word,the analysis of the users’query session and the analysis of users’clicks. These analysis are both the reason and the base of the query expansion.By using the data set of Sogou company, we construct a search engine platform for the experiments of this paper. The platform is based on the Indri search engine,first we preprocessed the data and then we build the index and query by Indri.This paper presents a crowdsourcing-based query expansion algorithm, we transform a user query procedure into a crowdsourcing procedure the idea of query expansion,according to the analysis of the user’s query log, and then by using the enhanced EM algorithm and get related documentation list, and then get the expansion words.The experimental results show that, compared with some classic traditional query expansion algorithm, this paper’s algorithm significantly improved query effect on P@20evaluation criteria... |