Font Size: a A A

Study Of Query Intention Classification Based On The Search Engine Log

Posted on:2017-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2348330503983625Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Entering the 21 st century, information resources is explosive growth. The Internet is brought to the user a wealth of information, but makes users want to find the information in a variety of network resources to meet their needs more and more difficult. Accurately and rapidly obtain the required information from the vast amounts of information resources become the direction to continuous efforts of information services.In this context, the search engine has become an important tool to help users quickly locate Internet resources and access to relevant information. However, the current search engine is mainly based on keyword matching search mode, and the user input to the search engine's short queries exists fuzziness and ambiguity. Therefore, users are eager search engine can automatically identify the user's intention included in the query, directly to return the document which related to its information needs.Among them, in order to effectively identify the user's query intention,the existing related work mainly include how to construct the classification system and how to query intention classification in the context of the given classification system. In this paper, we propose a new classification system based on Broder classification system and search engine query log, and we focuse on the classification performance of the classification feature. The main research contents of this paper can be summarized as follows:Firstly, considering the characteristics of the query information presented in the search engine, in addition to the navigation classes, the information class and the transaction class based on Broder classification system are subdivided and redefined, so that the user query intention can be more accurate positioning. Using the K-means clustering, a set of classification system based on query intention is proposed, which includes five categories: navigation class, consulting class, resource class, service class and hot spot class.Secondly, the appropriate classification of the query intention is selected: Query Information, Clicked URL and URL Click Ranks. In view of the characteristics of the user query data, we select the support vector machines SVM classification algorithm and use the SVM classifier LIBSVM. The SVM model is obtained by training the data set using the extracted features. Then the test data set is used to test the experimental.Finally, the accuracy rate and the recall rate are used to evaluate the classification performance of the query intention. Considering the irregularity of all kinds of distribution, for objective evaluation, this paper increases the F value to evaluate the result of the classification. The experiment investigates the effect of individual characteristics and different level features on classification. Experiment on the data set show that by combining multiple features of the query will help to identify the query intention. In the manual annotation of the test set, the classification of the query intention has a high accuracy rate and recall rate, and the F value is also greater than 0.8. It shows that the method of this paper is feasible to identify the user's query intention.
Keywords/Search Tags:Information Search, Classification System, Query Intention Classification, Classification Characteristic
PDF Full Text Request
Related items