Font Size: a A A

Analysis Of Web Users’ Query Intent

Posted on:2015-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Q ZhangFull Text:PDF
GTID:2298330452453275Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the emergence of internet more and more web content increase rapidlyervery day. These contents exist in traditional media such as web pages, documents,multimedia (images, audio and video), BBS, email, Blog, and some very popularsocial networks such as twitter, facebook included. And it is becoming more and moredifficult to find the information needed by the internet users on such vast and variedinternet world. so how to accurately predict underlying intent behind the query posesa huge challenge to the search engine.A user’s query goal always has its unique meaning for each query, which demandto return the satisfactory results according to the requirements of individual users,rather than merely according to the query items. How to accurately predict theunderlying intent behind the query submitted by web users is the focus of modernseach engine now. In the early stage of pioneering study of identifying users’ queryintention mainly carried out by artificial help. However in this paper, users’ queryintentions are identified automatically. To implement it, we do the work as follows.1. Classification standard is based on Broder‘s classification. Considering thatthe behavior of the query of Navigational and Transactional is almost the same that aweb site is needed to be navigated before further activities on it. There are also somesimilar classification features between them, while the difference is big compared toInformational query. So Navigational and Transactional query should be classified asone category compared to Informational type.2. In order to integrate with the search engine successfully, classificationalgorithm based on machine learning is used. While each classification algorithm hasits advantages and disadvantages. Some common classification algorithm carefullyanalyzed is needed. Given the vast amount of data on internet, the classificationmodel should meet the demand of low time complexity, we choose support vectormachine (SVM) as classification algorithm.3. Experimental data set used is from real web search engine logs, about2millions of queries and artificial annotated queries up to1,935which are typicalqueries.4. The key to establish a good classification model is to have adequateclassification features. To get effective features, not only the search engine logs are needed such as users’ click through features including nCS nRS and mRank, but alsosome other information needed. Through observing how users employ the searchengine to get information, average number of queries based on sessions (AveQuery) isproposed as an effective classification feature. Some features from the query itemsalso combined to classification features. These features are statically analysised fromthe data set. Some features’ differences are clear, while some are not clear whichmaybe not a linear classification feature.5. Precision rate and recall rate are used to evaluate the classification modelwhich are common evaluations in information retrieval field. However consideringthe un-balanced distribution of informational and noninformation query, F-value isadded to evaluate. Results show that by combining multiple features help identifyquery intent, and the classification accuracy is up to80%.
Keywords/Search Tags:search engine, query intent, queries Classification, feature combination
PDF Full Text Request
Related items