Font Size: a A A

Research On Query Intent Identification

Posted on:2015-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:J Q CuiFull Text:PDF
GTID:2348330482460378Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of information on the Web, search engines play an increas-ingly important role. Current search engines return search results mainly by calculating the similarity of user queries and documents. However, the search method based on similarity will only return pages containing the query keywords, but ignores the real needs of the user query. For queries with complex information, the search engine often returns many search re-sults that don't meet the needs of web users. Therefore, it is necessary to develop techniques on query intent recognition.So far, the most widely used intent taxonomy is proposed by Broder. According to Broder, there are three types of user intents:informational, navigational and transactional. In this thesis, the taxonomy of query intent has a new definition. The new definition of the tax-onomy is based on a particular domain. In a particular domain, three types of user intents are defined:get the attribute value, find relevant information about a particular entity and operate on an entity. In this thesis, research work is carried out mainly around the following two points:First, identify query domain by the method of classification; Second, in specific do-main, identify the type of query intent, and extract the query keywords, lastly generate the an-alytical results of query intent.The problem of identifying query domain can be resolved as classifying queries into dif-ferent domains. But there are two problems when classifying queries:First, user queries are usually very short, so features extracted from the queries will be sparse; Secondly, classifica-tion method based on machine learning requires a mass of training data, but it is unrealistic to manually annotate large scale queries. In order to solve the shortness and sparseness of short queries, we propose a method to select more useful features by leveraging baidu baike, one of the best chinese human knowledge base; In order to get enough annotated queries, we propose an automatic annotation method. Firstly, we classify URLs into different domains by using the manually maintained website directory; Secondly, by making using of the query logs of the search engine, we can get the URLs distribution of every query; Then, the queries can be la- beled according to the domains of the corresponding URLs automatically.Query intent recognition under the specific domain is another focus of this thesis. Firstly, we build the system of basic concepts and the knowledge base in the specific domain, the basic concepts consists of the common vocabulary (such as time, location, etc.) and the proper name in the specific domain; the knowledge base in the specific domain consists of entities, attributes and intent feature template. Secondly, based on the system of basic concepts and the knowledge base in the specific domain, we can analyze the query intent in a certain domain, and extract the query keywords, finally generate the analytical results of query intent.In summary,the contribution of our work includes the following:propose and implement a method for automatic annotation of query domain based on query click log; expand features of query by leveraging baidu baike; put forward a new taxonomy for query intent; introduce the concept of intention feature and implement the system of query intent identification. The final results show that, the classification of query has good performance base on query expan-sion; in the specific domain, the coverage of query intent identification is higher, and the ana-lytical results of query has high accuracy and recall rate.
Keywords/Search Tags:query intent, query domain identification, query intent identification, concept system, intent feature template, keyword extraction
PDF Full Text Request
Related items