Research On Search Engine Oriented Natural Language Processing Technology

Posted on:2012-01-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S S Li

Full Text:PDF

GTID:1118330362960508

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, information on the internet is ever-growing. It's becoming more and more difficult for Internet users to obtain required information accurately and quickly, which results in so-called information anxiety. Web search engines provide keywords matching based information retrieval mechanism for the users to assist them to get what they want instantly and have become the most efficient tools to get people rid of the information anxiety. Currently, web search is becoming a daily activity on Internet and has brought huge business opportunities. However, faced with the increasing variety of information on the network, the weakness of keyword-based search engine is becoming apparent, such as the difficulty of constructing a query accurately expressing user's information needs, the redundancy or useless of returned results and the low performance on retrieving subjective information. To meet users'needs to the best, the third generation of search engine, which is human-oriented, intelligent, personalized, has been widely studied.In recent years, with the transferring from keyword-based searching to knowledge-based searching, natural language processing has become an emerging technique and a new hotspot. Natural language processing techniques in search engines mainly focus on query understanding, query reformulation, search result organization and etc. It strives to provide a more intelligent and more humanized human-computer interaction to assist users get required information more convenient. In this dissertation, we'll investigate on query suggestion, query intent identification, query semantic structure understanding and answer summarization in the reuse of Q&A achieves. The key contributions and innovations can be briefly summarized as follows:1. Research on comparative relation based query suggestion and proposal of weakly-supervised method for comparator mining from comparative questions Usually, query suggestion recommends queries relevant to user's original query. For example, search engine recommendsâ€•ipod touch break prisonâ€–to users when they launch queryâ€•ipod touchâ€–. However, in different search scenarios, users prefer different relevant queries. For example, in the scenarios of purchasing, when users launch a query likeâ€•Nikon d200â€–, they usually want to know information about the product and compare it with comparable products to make a purchase decision. In this case, suggesting queries likeâ€•Cannon 300dâ€–and providing corresponding comparison information are quite helpful for users to make a purchase decision quickly. Compared withâ€•nokia d200 lensâ€–which is also a useful suggestion, query suggestionâ€•Cannon 300dâ€–requiring users holding relevant knowledge is usually what users want to know. So, it will be meaningful to improve the performance on information retrieval and make the search engine more intelligence and personalized when we classify relevant query suggestions by their semantic relations with user original query and provide different kinds of suggestion in different scenarios. Considering that comparing candidates is an essential step in users'decision making behaviors, we focus on the comparison search scenarios and investigate query suggestion based on comparison relations.In general, it is difficult to decide if two entities are comparable or not due to the subjectivity and complexity of comparison. Fortunately, plenty of comparative questions which intend to explicitly compare two or more entities are posted online.Those comparative questions provide evidences for what people want to compare, e.g.â€•Which to buy, iPod or iPhone?â€–. We call entities which are targets of comparison in comparative questions as comparators, such asâ€•iPodâ€–andâ€•iPhoneâ€–in above example. To mine comparators from comparative questions, we first have to detect whether a question is comparative or not. According to our definition, a comparative question has to be a question with intent to compare at least two entities. Please note that a question containing at least two entities is not a comparative question if it does not have comparison intent. However, we observe that a question is very likely to be a comparative question if it contains at least two potentially comparable entities. We leverage this insight and develop a weakly supervised bootstrapping method to identify comparative questions and extract comparators simultaneously.To our best knowledge, this is thefirst attempt to specially address the problem on finding good comparators to support users'comparison activity. We are also the first to propose using comparative questions posted online that reflect what users truly care about as the medium from which we mine comparable entities. Our weakly supervised method achieves 82.5% F1-measure in comparative question identification, 83.3% in comparator extraction, and 76.8% in end-to-end comparative question identification and comparator extraction.2. Proposal of a graph clustering based user intent detection methods by utilizing comparison relations and construction of a comparison behavior oriented comparison information retrieval system. In keyword-based search engines, people are asked to utilize queries consisting of limited keywords to describe their information needs. Due to the information loss during the abstraction process from user needs to keywords, the search intent expressed in a query may be not clear. Currently, search engines usually return a mixed set containing documents relevant to various query intent. Users need to browse a large number of documents to find what exactly meet their search intents. So, determining user's search intent and performing intent-oriented information search will help users to acquire information more accurately and quickly.As discussed above, there may be multiple user intents behind a query. For example, queryâ€•appleâ€–may search for a kind of fruit or an electronic brand. Whenâ€•appleâ€–means an electronic brand, user who launches queryâ€•appleâ€–may intents to learn products of apples or know the location of apple stores. If a user want to purchase an apple product, for example, the user launch a queryâ€•ipod touchâ€–, he may want to know relevant product information, or compare prices on different web sites, or compare the product with other products. And even when we're sure a user want to compare the queriedâ€•ipod touchâ€–with other products, users may want to compare products from different aspects. For example, in terms of product updates, people may want to compareâ€•ipod touchâ€–withâ€•ipod classicâ€–and in terms of entertainment, people may want to compareâ€•ipod touchâ€–withâ€•pspâ€–. All in a word, it is not a trivial task to understand user's intent clearly.In this dissertation, we focus on users'comparison behaviors and proposed a graph clustering based user intent detection methods by utilizing comparison relations. User's query intent is expressed by a set of comparators to the original query. A semantic label is assigned to the detected query intent utilizing an information extraction method. Experiments show that the accuracy of intent detection comes up to 92.7%. In addition, we build a user comparison intent detection system which provides different comparators and corresponding comparison information for the given query under different comparison intent.3. Research on query understanding in open domain and proposal of multi-term queries oriented pattern-based methods of query understanding.Besides entity queries, there are amounts of complexity queries consisting of multiple query terms, e.g.,â€•flight from Beijing to New Yorkâ€–. To determining intents for this kind of queries, we need to recognize and disambiguating each query term. Especially, search engines have crawled a lot of structured data which is less ambiguous in nature. When search against structured data, it is beneficial to covert keyword queries into SQL-like queries, for which query term recognizing and disambiguation is essential. We refer to the process of recognizing and disambiguating query terms as query understanding. For example, given a queryâ€•harry potter showtime in beijingâ€–, we firstly need to recognizeâ€•harry potterâ€–,â€•showtimeâ€–andâ€•beijingâ€–as query terms, and then it is necessary to disambiguate the semantics of terms with relevant labels, e.g.,â€•harry potterâ€–asâ€•movie nameâ€–,â€•beijingâ€–asâ€•cityâ€–andâ€•showtimeâ€–is an attribute term for a movie.In this dissertation, we focus on query understanding for multi-term queries in open domain. We firstly construct a semantic dictionary with existing methods; and then examine open domain query understanding (namely query term recognition and disambiguation) via the dictionary. In particular, we focus on addressing the two problems followed by our problem setting.(1) Automatically constructed lexicons would contain much noisy in both labels and term instances. Such noisy can seriously deteriorate query understanding performances. (2) The vast amount of labels is necessary in open domain environment and makes it hard to apply the previous query understanding approaches based on sequential labeling techniques, which are originally developed to deal with limited amount of term labels.To resolve such a problem, we propose a pattern-based method to recognize a term and disambiguate its labels. In our approach, we firstly construct semantic lexicons by applying one developed method to extract hyponymy relations. Then, we propose a mutual reinforcement algorithm to mine context patterns. Based on the mined context patterns and semantic lexicons, we perform term recognition and disambiguation. To our knowledge, our study is the first attempt to try to understand open-domain queries utilizing automatically mined lexicons.4. Research on answer completeness in the process of reusing Q&A resources collected by Community Question Answer (cQA) services and proposal of question oriented answer summarization based on hierarchical structure of semantically dependency among terms.Traditional search engines don't work as well as expected on complex question queries, e.g.,â€•how to recover my doc fileâ€–,â€•what is the best smart phone?â€–and etc. These complex questions usually related to personal experiences or opinion and have different answers from different individuals. Fortunately, the appearances of cQA services provide large knowledge resources for such kind of questions. How to reuse Q&A archives in cQA services to improve satisfaction on complex question queries has become an attractive research field. However, current researches mainly focus on assessing whether answers in cQA are accuracy enough to be reused, and ignore the completeness of answers. In fact, since the answers of complex questions are not unique, the completeness of answers is also a critical factor for enhancing satisfaction of information retrieval.In this paper, we try to do answer summarization for a particular type of questions: survey questions, which ask for recommendations on best choices. Obviously, the completeness of the answer is crucial because different users may be interested in different choice suggestions.To our best knowledge, it's the first research pointing out the importance of answer completeness in cQA knowledge reuse. We are also the first to focus on survey question which is an interesting type of opinion questions and completeness of whose answers are potentially important for better reuse. Additionally, we recommend generating complete answers by question-oriented answer summarization. We propose an efficient algorithm to build hierarchical structure of semantically dependency among terms and perform question-oriented summarization via the structure to generate a complete answer based on existing answers from users in cQA services. The performance is promising.

Keywords/Search Tags:

Intelligence Search Engine, Natural Language Processing, Query Suggestion, Query Intent Detection, Query Understanding, Answer Summarization, Information Extraction

PDF Full Text Request

Related items

1	On The Study Of Natural Query Language Understanding In Computer
2	Research On Topic Based Query Intent Identification
3	Understanding The Semantic Intent Of Domain-Specific Natural Language Query
4	Research And Implementation Of Query Suggestion Model With Query Log And Corpus Data In Meta-search Engine
5	The Research And Implementation Of Query Suggestion Mechanism In The Absence Of Logs For Specific Search Engine
6	Relevant Techniques Of Named Entity Query Processing For Search Engine
7	Automatic Classification And Analysis Of Query Intent
8	Research On Query Intent Identification
9	Analysis Of Web Users’ Query Intent
10	Research On Key Technologies Of Distributed Rank-aware Query Processing