Font Size: a A A

Classification Of Query Intent Using Encyclopedia Knowledge And Statistical Method

Posted on:2012-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:G HuFull Text:PDF
GTID:2218330362450409Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increase of resources and services on the Internet, the users often need the help of search engines to find relevant information. The general search engines contain so much useless information that the users need to filter them out according to their needs. Vertical search engines which are rarely used can return particularly relevant results in one particular domain. However, the user needs to select several vertical search engines if he has multi-intent, and this is not convenient for obtaining web knowledge. If general search engines can classify the user's intents accurately, they can merge relevant results from one or several vertical search engines according to the intent types, and display the results in different styles, which will greatly improve users'satisfaction.Traditional methods based on machine learning techniques require much human effort for better performance. This thesis focuses on classifying query intent using non-statistical and statistical methods based on encyclopedia, neither of which needs much human effort. We concentrate on the following aspects:Firstly, after analyzing the challenge to the traditional query intent classification methods, we propose an intent classification algorithm based on the encyclopedia knowledge. The method firstly maps the intent type and user query to the encyclopedia present space, and then classifies the intent of the user in this space using non-statistical method. We compare our method with some traditional methods to validate its effectiveness.Secondly, considering the limitation that statistical intent classification algorithms often need many human labeled queries to get better performance, we automatically obtain a huge labeled training dataset from seeds of every intent type to imitate real search queries of users. Then we classify the query intent using Logistic Regression algorithm trained by this dataset. We validate the performance of this method through comparing it with the classifiers trained using real user queries labeled by human.Thirdly, we merge the advantages of the two methods to get a better intent classifier, and validate its effectiveness on the same evaluate dataset. Finally, in order to make search results better reflect the users'query intents, we select different vertical search engines according to the users'intent types, and calculate an intent relevant score for every search result returned by the general search engine.
Keywords/Search Tags:intent classification, wikipedia, explicit semantic analysis, logistic regression
PDF Full Text Request
Related items