Font Size: a A A

Automatic Classification And Analysis Of Query Intent

Posted on:2015-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:1228330428474851Subject:Information Science
Abstract/Summary:PDF Full Text Request
Nowadays, with the information growing exponentially, though the information society brings users such a wealth of information, it also make them easily lost in the vast sea of information. Thus, to rapidly and accurately get the information needed in the huge information resources has become the long-term pursuit of information servers. Under this circumstance, the search engine (SE) becomes the tool which can help users quickly locate web resource and get access to relative information. However, the brief and short query statement submitted by users always involves fuzziness and ambiguity, which could just express their information request roughly. Therefore, users eagerly hope the SE could automatically identify their intents so as to return the related documents directly. As a consequence, the identification of query intention, i.e. the user information needs, search goals in queries, has been one research hotpot and focus in the current academia and industry.And one of the most important research directions is to classify queries under a given classification category. In this respect, recent studies are almost based on the taxonomy proposed by Broder, which divided query intents into three types, i.e. informational, navigational and transaction. These previous research mainly focused on distinguishing informational queries from navigational queries,but seldom take automatic classification of the above three. Importantly, the classification of query intent needed to guide the optimization of search engine performance, however, there has been few work studying how to guide search engine optimization by using this classification currently.In light of this, this paper firstly realizes the automatic classification of informational, navigational and transactional queries. Based on this, the query intent are analyzed from the points of SE stability, personalized potential and web dynamics for the first time, in the hope of providing theoretical guidance for the performance improvement of SE. In general, the main contents of the dissertation, divided into seven chapters, and as follows:Chapter0introduces research background and implication. Firstly, by way of reviewing extensive Chinese and English papers, this chapter does the literature review on the users’queries from two perspectives:identification and analysis of query intent. For the former, it contains the query classification under a given taxonomy as well as query identification without a given taxonomy. For the latter, a literature review is made on the SE stability, personalized potential and web dynamics. On the basis of the above work, the research contents and study methods in this paper are determined and followed by the research difficulties and new viewpoints in this research.Chapter1is for basic theories. This chapter firstly figures out the problems in expressing information needs for users in the information retrieval model, secondly illustrates the role where query intents play in the relevance of search results, thirdly summarizes the concepts categories, descriptive dimensions, categories and analysis dimensions of query intent.Chapter2realizes the automatic identification of user’query intents. On the basis of human-annotated data, the automatic classification of informational, navigational and transactional queries through text classification methods is realized firstly. Based on the features proposed by prior work, four new levels of features, i.e. query expression, URLs, words of form label in search results, query reformulation are proposed. The experiment discusses the effects of both a singular feature and five different types of features. Apart from this, this chapter analyzes the query identification performances between rare and non-rare queries, and between ambiguous and non-ambiguous queries in terms of each type of features.Chapter3analyzes the SE stability. of query intent. Taking three SEs Baidu, Yahoo and Sogou as research objects and setting two months as observation period, this experiment analyzes the stability which changed over time of the same SE and different SEs for the informational, transactional and navigational queries.Chapter4analyzes the personalized potential of query intent. The personalized potential of each query intent is firstly measured by explicit and implicit measures respectively. And then, the effective implicit measures are obtained by analyzing the correlation between implicit measures and explicit measures. On light of these, for each category of query intent, this chapter also analyzes which query feature can effective characterize their personalized potential.Chapter5analyzes the web dynamic of query intent. This chapter firstly analyzes the web dynamic of query intent from the following three aspects:query dynamic, web dynamic and the change of information need. And then, for each category of query intent, this chapter also analyzes the change of web document and the change of information need for different type of query dynamic. Chapter6summarizes the whole work of this paper. In this chapter, the contents and views of this paper are summarized. And then some deficiencies are raised and the prospects of the follow-up study are made at last.
Keywords/Search Tags:Query Intent, Query Classification, Search Engine Stability, Personalized Potential, Web Dynamics
PDF Full Text Request
Related items