Font Size: a A A

Astudy On The Methods Of Chinese Product Query Classification Based On User Behavior And Semantic Expansion

Posted on:2013-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YeFull Text:PDF
GTID:2248330362963687Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web query classification is a task to assign a Web search query to one or morepredefined categories. There are two challenges in Web query classification. First,web queries are usually too short to fully express the user’s query intent. The secondproblem is the lack of enough training data, which makes query classification moredifficult. Presently there are two most common approaches to study queryclassification. On the one hand, the accuracy of the classifier can be improved byincreasing the training data;On the other hand, the query text can be enriched byexpanding the query itself. This paper studies the Chinese product queryclassification, which is a special kind of Web query intent classification. Web queryclassification is an effective way to identify the user’s query intent. It can not onlyimprove the accuracy of web search, but also can be applied in many areas,such asthe vertical search, products recommendation and advertising recommendations. Thispaper chooses this domain because product queries are very important for both Webusers and commercial search engines, especially when more and more people tend topurchase what they need on the Internet. Another reason is that we have plenty of datain this domain, so we can conduct solid experiments and provide insights forclassification problem in other domains.Two methods,the user click behavior and query similarity are used to collectenough training data from the product search log,which can solve the problem of thelack of training data. For the problem of too short query text,product queries areenriched by using two different ways based on search engine and Chinese Wikipedia.The method of enriching based on search engine has a better result,but it need to beoperated online to get the search engine returns and then process the results, which is time-consuming. So a hybrid method of product query classification is proposed inthis paper. In the first step, original queries are put into the classifier. If theclassification confidence is above the threshold gained from the experiment, the resultwill be returned directly. Otherwise, the query will be enriched by using search engineand then put into classifier again to get the final results. Experiment can verify thatthis method can classify the product query accurately and quickly. The way ofcombining two classifier is also used to further improve the classification accuracyand efficiency. At last,I apply the hierarchical classification algorithm in productquery classification. And add the hybrid algorithm into the hierarchical classificationwhich obtain good classification result.
Keywords/Search Tags:Product Query Classification, Training Data Collection, Query Enrichment, Hybrid Classification, Hierarchical Classification
PDF Full Text Request
Related items