Font Size: a A A

Research On Search Engine Oriented Natural Language Processing Technology

Posted on:2012-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:S S LiFull Text:PDF
GTID:1118330362960508Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, information on the internet is ever-growing. It's becoming more and more difficult for Internet users to obtain required information accurately and quickly, which results in so-called information anxiety. Web search engines provide keywords matching based information retrieval mechanism for the users to assist them to get what they want instantly and have become the most efficient tools to get people rid of the information anxiety. Currently, web search is becoming a daily activity on Internet and has brought huge business opportunities. However, faced with the increasing variety of information on the network, the weakness of keyword-based search engine is becoming apparent, such as the difficulty of constructing a query accurately expressing user's information needs, the redundancy or useless of returned results and the low performance on retrieving subjective information. To meet users'needs to the best, the third generation of search engine, which is human-oriented, intelligent, personalized, has been widely studied.In recent years, with the transferring from keyword-based searching to knowledge-based searching, natural language processing has become an emerging technique and a new hotspot. Natural language processing techniques in search engines mainly focus on query understanding, query reformulation, search result organization and etc. It strives to provide a more intelligent and more humanized human-computer interaction to assist users get required information more convenient. In this dissertation, we'll investigate on query suggestion, query intent identification, query semantic structure understanding and answer summarization in the reuse of Q&A achieves. The key contributions and innovations can be briefly summarized as follows:1. Research on comparative relation based query suggestion and proposal of weakly-supervised method for comparator mining from comparative questions Usually, query suggestion recommends queries relevant to user's original query. For example, search engine recommends―ipod touch break prison‖to users when they launch query―ipod touch‖. However, in different search scenarios, users prefer different relevant queries. For example, in the scenarios of purchasing, when users launch a query like―Nikon d200‖, they usually want to know information about the product and compare it with comparable products to make a purchase decision. In this case, suggesting queries like―Cannon 300d‖and providing corresponding comparison information are quite helpful for users to make a purchase decision quickly. Compared with―nokia d200 lens‖which is also a useful suggestion, query suggestion―Cannon 300d‖requiring users holding relevant knowledge is usually what users want to know. So, it will be meaningful to improve the performance on information retrieval and make the search engine more intelligence and personalized when we classify relevant query suggestions by their semantic relations with user original query and provide different kinds of suggestion in different scenarios. Considering that comparing candidates is an essential step in users'decision making behaviors, we focus on the comparison search scenarios and investigate query suggestion based on comparison relations.In general, it is difficult to decide if two entities are comparable or not due to the subjectivity and complexity of comparison. Fortunately, plenty of comparative questions which intend to explicitly compare two or more entities are posted online.Those comparative questions provide evidences for what people want to compare, e.g.―Which to buy, iPod or iPhone?‖. We call entities which are targets of comparison in comparative questions as comparators, such as―iPod‖and―iPhone‖in above example. To mine comparators from comparative questions, we first have to detect whether a question is comparative or not. According to our definition, a comparative question has to be a question with intent to compare at least two entities. Please note that a question containing at least two entities is not a comparative question if it does not have comparison intent. However, we observe that a question is very likely to be a comparative question if it contains at least two potentially comparable entities. We leverage this insight and develop a weakly supervised bootstrapping method to identify comparative questions and extract comparators simultaneously.To our best knowledge, this is thefirst attempt to specially address the problem on finding good comparators to support users'comparison activity. We are also the first to propose using comparative questions posted online that reflect what users truly care about as the medium from which we mine comparable entities. Our weakly supervised method achieves 82.5% F1-measure in comparative question identification, 83.3% in comparator extraction, and 76.8% in end-to-end comparative question identification and comparator extraction.2. Proposal of a graph clustering based user intent detection methods by utilizing comparison relations and construction of a comparison behavior oriented comparison information retrieval system. In keyword-based search engines, people are asked to utilize queries consisting of limited keywords to describe their information needs. Due to the information loss during the abstraction process from user needs to keywords, the search intent expressed in a query may be not clear. Currently, search engines usually return a mixed set containing documents relevant to various query intent. Users need to browse a large number of documents to find what exactly meet their search intents. So, determining user's search intent and performing intent-oriented information search will help users to acquire information more accurately and quickly.As discussed above, there may be multiple user intents behind a query. For example, query―apple‖may search for a kind of fruit or an electronic brand. When―apple‖means an electronic brand, user who launches query―apple‖may intents to learn products of apples or know the location of apple stores. If a user want to purchase an apple product, for example, the user launch a query―ipod touch‖, he may want to know relevant product information, or compare prices on different web sites, or compare the product with other products. And even when we're sure a user want to compare the queried―ipod touch‖with other products, users may want to compare products from different aspects. For example, in terms of product updates, people may want to compare―ipod touch‖with―ipod classic‖and in terms of entertainment, people may want to compare―ipod touch‖with―psp‖. All in a word, it is not a trivial task to understand user's intent clearly.In this dissertation, we focus on users'comparison behaviors and proposed a graph clustering based user intent detection methods by utilizing comparison relations. User's query intent is expressed by a set of comparators to the original query. A semantic label is assigned to the detected query intent utilizing an information extraction method. Experiments show that the accuracy of intent detection comes up to 92.7%. In addition, we build a user comparison intent detection system which provides different comparators and corresponding comparison information for the given query under different comparison intent.3. Research on query understanding in open domain and proposal of multi-term queries oriented pattern-based methods of query understanding.Besides entity queries, there are amounts of complexity queries consisting of multiple query terms, e.g.,―flight from Beijing to New York‖. To determining intents for this kind of queries, we need to recognize and disambiguating each query term. Especially, search engines have crawled a lot of structured data which is less ambiguous in nature. When search against structured data, it is beneficial to covert keyword queries into SQL-like queries, for which query term recognizing and disambiguation is essential. We refer to the process of recognizing and disambiguating query terms as query understanding. For example, given a query―harry potter showtime in beijing‖, we firstly need to recognize―harry potter‖,―showtime‖and―beijing‖as query terms, and then it is necessary to disambiguate the semantics of terms with relevant labels, e.g.,―harry potter‖as―movie name‖,―beijing‖as―city‖and―showtime‖is an attribute term for a movie.In this dissertation, we focus on query understanding for multi-term queries in open domain. We firstly construct a semantic dictionary with existing methods; and then examine open domain query understanding (namely query term recognition and disambiguation) via the dictionary. In particular, we focus on addressing the two problems followed by our problem setting.(1) Automatically constructed lexicons would contain much noisy in both labels and term instances. Such noisy can seriously deteriorate query understanding performances. (2) The vast amount of labels is necessary in open domain environment and makes it hard to apply the previous query understanding approaches based on sequential labeling techniques, which are originally developed to deal with limited amount of term labels.To resolve such a problem, we propose a pattern-based method to recognize a term and disambiguate its labels. In our approach, we firstly construct semantic lexicons by applying one developed method to extract hyponymy relations. Then, we propose a mutual reinforcement algorithm to mine context patterns. Based on the mined context patterns and semantic lexicons, we perform term recognition and disambiguation. To our knowledge, our study is the first attempt to try to understand open-domain queries utilizing automatically mined lexicons.4. Research on answer completeness in the process of reusing Q&A resources collected by Community Question Answer (cQA) services and proposal of question oriented answer summarization based on hierarchical structure of semantically dependency among terms.Traditional search engines don't work as well as expected on complex question queries, e.g.,―how to recover my doc file‖,―what is the best smart phone?‖and etc. These complex questions usually related to personal experiences or opinion and have different answers from different individuals. Fortunately, the appearances of cQA services provide large knowledge resources for such kind of questions. How to reuse Q&A archives in cQA services to improve satisfaction on complex question queries has become an attractive research field. However, current researches mainly focus on assessing whether answers in cQA are accuracy enough to be reused, and ignore the completeness of answers. In fact, since the answers of complex questions are not unique, the completeness of answers is also a critical factor for enhancing satisfaction of information retrieval.In this paper, we try to do answer summarization for a particular type of questions: survey questions, which ask for recommendations on best choices. Obviously, the completeness of the answer is crucial because different users may be interested in different choice suggestions.To our best knowledge, it's the first research pointing out the importance of answer completeness in cQA knowledge reuse. We are also the first to focus on survey question which is an interesting type of opinion questions and completeness of whose answers are potentially important for better reuse. Additionally, we recommend generating complete answers by question-oriented answer summarization. We propose an efficient algorithm to build hierarchical structure of semantically dependency among terms and perform question-oriented summarization via the structure to generate a complete answer based on existing answers from users in cQA services. The performance is promising.
Keywords/Search Tags:Intelligence Search Engine, Natural Language Processing, Query Suggestion, Query Intent Detection, Query Understanding, Answer Summarization, Information Extraction
PDF Full Text Request
Related items