Font Size: a A A

Research On Topic Based Query Intent Identification

Posted on:2014-06-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W SongFull Text:PDF
GTID:1268330392472597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The search engine has become one of the most important interfaces to access in-formation on the internet. Current search engines mostly adopt keywords match basedquery respond paradigm, where a sorted list of potential relevant documents are returnedin respond to a query. However, for queries with complex information, this paradigmcan’t fully satisfy user information needs. The user issued queries are usually very short,ambiguous and multifaceted. The keywords based strategy only focuses on finding docu-ments containing user keywords, but ignores the relevance to users’ search intents. Due tothis, there are many documents in search results that contain query keywords, but are notrelevant to user information needs. To make the search engine to satisfy user informationneeds better, it is necessary to develop techniques on query intent understanding.The query intent is used to represent the purpose of user searches, which is an inter-mediate state between the query keywords and the real information needs. Existing workon query intent understanding focuses on identifying navigational queries, that is to de-termine whether the purpose of a user query is to find a specific website. However, only asmall ratio of queries are navigational queries, and more queries are informational querieswhich contain complex user information needs. It is necessary to pay more attention onquery intent understanding on such queries. This thesis focuses on representing, identify-ing and applying query intents for informational queries. Specifically, we adopt diferenttopic based representations to formally describe the query intents, attempt to understandquery intents from diferent angles, and develop novel search services and paradigmsbased on query intents. The main content of this thesis is summarized as following:(1) Queries are usually ambiguous. To address this problem, query intents are repre-sented using topical categories which depends on a given topical taxonomy. The queriesare mapped into this taxonomy. Therefore, the query intent identification is cast to theproblem of query classification, where a query is represented by one or more topics. S-ince the topical taxonomy is structured, it is efective to describe the semantic informationof queries and beneficial to construct the space structure of the information needs. We pro-pose a query topical classification approach based on user naturally annotated resources:The manually maintained website directory is used to classify URLs into a topical tax-onomy; by making use of the search results and the query logs of the search engine, the queries could be associated with URLs; the queries then could be labeled according tothe topics of the corresponding URLs automatically. In this way, it is able to collect largescale labeled queries with minimum human eforts. These labeled queries could be usedto train a statistical classifier. The proposed method, which alleviates the data sparsi-ty problem for query classification, is more accurate and with high efciency for onlineprocessing. Query topical classification could be applied for many query intent relatedscenarios.(2) Queries are usually broad (underspecified). To address this problem, query in-tents are represented with a set of query subtopics. Each subtopic contains an intentphrase to indicate a specific search intent. For example, suppose the original query is“microsoft”,“microsoft research” and “microsoft surface” could be considered as querysubtopics, where “research” and “surface” are intent phrases. Query subtopics don’t de-pend on any predefined topical taxonomy. Therefore, it is flexible to describe fine grainedquery intents. The key challenge for this task is how to extract query subtopic candidatesand how to organize these candidates according to intents. We propose a clustering basedquery subtopic mining approach. The approach could be divided into4phases: querykey words identification, query subtopic candidates extraction, query subtopic candidatesclustering and query subtopic ranking. We analyze the characteristics of query subtopiccandidates from diferent information resources and apply appropriate clustering algorith-m. The experiments show the proposed method outperforms the related searches providedby commercial search engines.(3) Query intents are usually user dependent. To address this problem, query intentsare personalized represented according to user topical interests. The users submitting thesame query may have diferent intents. To predict each individual’s intent accurately,personal background and historical information should be investigated. We apply proba-bilistic topic model to model user search history, construct user interest model, and mapuser queries to the user topical interests. So it is kind of personalized query intent iden-tification. A personalized search approach is proposed within the language model basedsearch framework by incorporating the personalized representation of query intents. Tothe best of our knowledge, this is the first work to bridge topic model based user modelingand personalized search.(4) We propose multiple query subtopics oriented query summarization, which is anovel search paradigm. This task aims to provide a semi-structured, specific and informa- tive summary for a query from diferent aspects. Ideally, user information need could besatisfied directly without further exploring the web pages. We formally define the task,propose the framework and set up appropriate evaluation standards. A composite querybased approach is proposed to proactive gather information and model query subtopicsbased on comparative data mining. This search paradigm could be seen as an applicationof query subtopic mining.To conclude, this thesis focuses on representing, understanding and applying thequery intents based on topical representations for informational queries. Query topicalclassification and query subtopic mining could be seen as analyzing user intents from theview of the crowds, and used to construct information need space. User topical interestbased query intent representation could be seen as personalized query intent identificationaccording to individual’s information. The query topical representations are successfullyapplied for2specific applications: personalized search and query subtopic based querysummarization. This indicates that appropriate representations and deeper understand-ing of query intents are beneficial for providing richer search interfaces and informationorganization, improve the search quality and user experiences. The topical query intentrepresentation is useful for:(1) Construct the space structure of user information need-s, help user to know about the relevant information better and specify their search goalsquickly.(2) Provide richer search paradigms, improve the search quality and satisfy userinformation need fast and accurately. We hope the preliminary results and conclusionscould be helpful for researchers in related fields.
Keywords/Search Tags:Query Intent, Query Topic, Query Subtopic, Personalized Search, Query Summarization
PDF Full Text Request
Related items