Font Size: a A A

Conceptual Analysis Of User Queries In Information Retrieval

Posted on:2010-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:1118360305456625Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In information age, it is a daily need for people to extract needed information fromvast data. Current information retrieval (IR) systems could only render users limited helpdue to their low precision. In other words, the large amount of useless information noises insearch results brings users not help but annoyance. The crux of the problem is that current IRsystems are based on the Boolean model, which employs'discrete'management like ANDand OR. They extract discrete keywords from user queries and web documents for matching.This method cuts off the conceptual relation between keywords, which results in the loss ofconceptual integrity in topic expression by these keywords. The noises are created in thisprocess. The paper's work starts concerning this problem. The bottleneck of the problemis not algorithms for efficiency in the first place, but how to keep the integrity of users'query concepts in the IR process. So, the important starting point is to exploit how to indexqueries by conceptual analysis. The basic unit of indexing is no longer strings but an integralstructure of concepts represented by Chinese words and their relations.The author researches the conceptual analysis and indexing of Chinese user queries,which is an important component of conceptual information retrieval. The author recoversthe information need of users by capturing the inner relation between concepts in order tokeep the integrity of concepts, which will directly affect the performance of IR. Differentfrom document analysis, the purpose of query analysis is not only to have a conceptualrepresentation of a query, but more importantly, also to capture the intensional features ofconcepts of information need in a user's mind, which is more important. This paper focuseson the phenomenon of user queries, namely the analysis of concepts and the summary ofconceptual representations. The author establishes some exploratory methods for restrictedquery analysis in the hope that these methods may be extended to general queries.On the one hand, this paper's work is an important component of conceptual retrievalmodel, which provides valuable ideas and methods for the conceptual analysis of queries. Onthe other hand, this work benefits several hot topics in natural language processing, includingnamed-entity recognition, grammar debugging, semantic analysis, etc.The novelties of this paper reside in the following aspects. 1. The query analysis fully expresses the characteristics of Chinese. Conceptual analysisand its representation shows that the naming of entities in Chinese re?ects conceptscoupling: class name + emergent distinctive features. Conceptual semantic analysis isnovel.2. The author selects Chinese compounds as the core structure for Chinese words andexpressions. The advantage is that compounds express concepts concisely. Moreover,it may be foretold that compounds, which don't have morphological changes, could beused to simplify alignments in multi-lingual machine translation. This is based on thehypothesis that concepts in cognition are similar between different languages.3. The foundation of conceptual analysis for queries is the research and analysis ofqueries themselves. The author analyzes examples in a real query set to summarizeseveral principles of conceptual analysis as well as the types of query concepts. Healso shows some common relation names in queries and some conceptual models forqueries.4. The author suggests a novel debugging approach for unification grammars to conve-niently reform an existing unification based parser for a new domain, such as queryanalysis. The author models the grammar as a Kripke structure, which will be theoret-ically verified by model checking. The proposed debugging method can automaticallydiscover errors in the grammar; therefore it can lower debugging complexity signifi-cantly.5. A named-entity recognition method based on Web mining is suggested for complexnamed entities in queries. Based on this method, the author establishes a two layerprototype system CQUA-1 for the analysis of keyword style queries. CQUA-1 firstmatches parts of the query to a concept frame graph which re?ects the domain knowl-edge. Then, the unmatched concepts are attached to that graph to form the final con-ceptual graph. CQUA-1 balances between the analysis of domain knowledge and gen-eral knowledge.6. The author suggests an approach, CQUA-2, for conceptual analysis for queries informs of WH-questions. This approach considers the conceptual analysis as anexample-based machine translation problem. The questions are the source languagewhile the conceptual graphs are the target language. Evaluation shows that even witha small example base CQUA-2 has good performance. In computing the similarity ofsentences, the author takes lexical semantics and syntactic information into account. The word similarity computing method suggested here is an ensemble model that com-bines several computing models. It not only benefits CQUA-2, but achieves very goodresult on a standard evaluation set.
Keywords/Search Tags:Query Analysis, Conceptual Graphs, Semantic Analysis, Information Retrieval
PDF Full Text Request
Related items