Font Size: a A A

Relevant Techniques Of Named Entity Query Processing For Search Engine

Posted on:2013-02-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:D Y WuFull Text:PDF
GTID:1268330392467597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, Internet is an important platform on which people access toinformation and make transactions. With explosively increasing resources ofinformation and application on the Internet, search engine has been becoming anindispensable tool that guides people instantly and precisely access to their neededinformation on the Internet. Users issue queries to search engine and use the queriesto represent their information needs. Search engine provides users with the resultthey need according to analyzing the queries. Obviously, queries are the media inwhich users’ information need is delivered to a search engine. In order to makesearch engine to understand the information needs of queries better, it is necessary tocarry out research on the techniques of processing and analyzing queries.Named entity query is an important type of query, which is a high percentage inqueries of search engine. Named entity queries have special features and attributes.To carry out research on named entity query processing is beneficial for searchengine to better understand users’ search intent represented by their issued queries,which would help search engine to provide more precise search results and satisfyusers with better search experiences. There is some relevant research work on thenamed entity query processing such as acquiring semantic segments in queries,recognizing the named entities in queries, analyzing the search intent of queries, etc.The main contents in our research can be summarized as follows:1、Unsupervised query segmentation based on monolingual word alignmentmodel. Query segmentation, which is a fundamental and essential query processingtask, deals with obtaining a sequence of words or phrases by segmenting a sequenceof characters. There are a large numbers of words appearing in queries in them agreat number of informal words exist. The supervised segmentation methods need alarge amount of manually annotated training data, which is not suitable for querysegmentation. Therefore, in this work we propose an approach for unsupervisedquery segmentation in which the query segmentation model is trained only usingquery log. Due to effectively combining the information about charactersco-occurrence, position and fertility in queries, the query segmentation modelachieves a good performance. In this work, we further carry out research onmultilevel query segmentation in which a query can be parsed as a tree structure. Thetree structure of a query presents which segments in a query are closely related to each other. The experimental results show that our approach achieves higheraccuracy than existing methods, which demonstrates that our approach is effective.2. Mining named entities in query log based on random walk on graph. Thereare a lot of named entities contained in the queries of query log. The named entitiesmined from query log coincide with the queries that users construct in practice. Thequery log of a search engine is constantly updated and can contain a number of newnamed entities. Therefore, the work of mining named entities is useful for searchengine to process named entity queries. This work proposes a weakly supervisedmethod of mining named entities. Firstly, a few named entities selected manually areused as the seeds for a given named entity category. And then the context patterns,the candidate named entities and users’ clicked URLs are extracted from query logusing the seeds in a bootstrapping process and adopted to construct a tri-partite graph.Finally, the named entities belonging to the given category are extracted using therandom walk algorithm on the graph. The experimental results show that thealgorithm can effectively exploit information related to named entities in a query logto impove the performance of mining named entities.3. Acquiring synonymous attribute phrases for named entities via onlineencyclopedia. A named entity has a number of attributes which describe its propertiesor features. Synonymous attribute phrases are the phrases that refer to the sameattribute with different surface forms for a named entity category. In named entityqueries, the attribute phrases are usually used to represent the intent of thecorresponding attribute value. Therefore, synonymous attribute phrases are beneficialfor analyzing the search intents of named entity queries. This work exploits onlineencyclopedia to acquire the attribute phrases of named entities and identifysynonymous attributes among them using a classification framework combiningmultiple features. To our knowledge, this is the first attempt to acquire synonymousattribute phrases ultilizing online encyclopedia. The experimental results show thatonline encyclopedias are the rich resources for acquiring synonymous attributephrases, in which our approach can effectively acquire a great amount ofsynonymous attribute phrases.4. Recognizing the intents of named entity queries. This work includes two parts;one is identifying query intents based on classification from the perspective of coarsegrained intent analysis, another is acquiring search patterns of named entity queriesfrom the perspective of fine grained intent analysis. In query intent classificationwork, we adopt a classification approach which combines multiple effective features acquired from different resources including query text semantic and syntacticanalysis, information obtained from query log and contents of result returned bysearch engine. Query intent classification can limit the search space of search enginebased on classified information and thus improve precision of search result. We usethe informational and transactional named entity queries recognized by query intentclassification model to extract query patterns which users often use in queries. Thequery patterns are clustered into groups and those in a group have the same searchintent. When the query patterns match the queries issued to search engine, searchengine can accurately capture the search intent of the queries. This work proposes acascade method which graph based method and similarity based method aresuccessively applied to extract query patterns from named entity queries. Theexperimental results demonstrate that our method can effectively acquire the querypatterns for multiple named entity categories.In summary, this dissertation describes research on some crucial techniques ofnamed entity query processing, in which some of the query processing techniques cannot only be applied to named entity queries but also to general queries. This research ofthe dissertation has achieved some preliminary results, which we hope can be helpfulto the task of named entity query processing in search engine.
Keywords/Search Tags:named entity, query segmentation, synonymous attribute, query intent, query pattern
PDF Full Text Request
Related items