Font Size: a A A

Research On Concept-Based Chinese Text Retrieval

Posted on:2008-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2178360215990280Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese text retrieval is an important field of information retrieval. Currently, retrieval technique by keywords is applied to almost all the search engines, and their principle is mechanical match of keywords.One of the problems is that the low recall rate would influence the performance of the retrieval system. Concept Retrieval system can solve this problem using the following principle: constructing a knowledge base by conceptual information extracted through natural language processing techniques, and then providing a direct answer to the users'questions by searching the relevant information in the knowledge base.In this paper, text reconstruction and query expansion in Chinese text retrieval are studied. The main research work is listed as follows:1. A method, named as TKSM (Text Keywords Synonymy Merger), for terms weight computation based on text keywords synonymous merger is proposed and a model, named as CSSERM (Concept Semantic Synonymy Expansion Retrieval Model), of text retrieval based on concept semantic synonymous expansion is constructed. The main problems of TF-IDF which is the typical existing method of text terms weight computation: 1) The semantic synonymous is not considered. 2) Text terms have no fixed weight. 3) The core words which support text theme easily give lower weight. TKSM provides an effective way to solve the above three problems. On the base of TKSM, the retrieval model CSSERM is constructed. Experiments show that CSSERM has a little lower precision than keywords retrieval model (KRM), but higher recall rate. The tradeoff performance of CSSERM is better.2. A combinative retrieval model (CRM) combining KRM and CSSERM is presented. Precision and recall of retrieval system are the two criteria. The precision of CSSERM is a little lower than KRM, and a corresponding measure is taken, which is Combining KRM and CSSERM, by adjusting the combinative parameters to find a better model. Theoretical analysis and Experiments show that the combinative parameters can be adjusted to balance the precision and recall rate to achieve a better retrieval result.3. Two methods, which are based on computing retrieval concepts weight, and one method, which is based on computing text concepts weight based on retrieval concepts expansion (RCE), are proposed, two retrieval models are constructed based on concept tree expansion.Analyzing expansion of semantic levels of concept, the relationship between father and son concept in concept tree is translated by the similarity of words. The weight of retrieval concept is computed by two methods which will be used in the two corresponding retrieval models. RCE is used in one model. Experiments show that the precision of the two retrieval models is the same as keywords retrieval model, but the recall rate is improved greatly.
Keywords/Search Tags:Natural Language Processing, Text Retrieval, Retrieval Model, Concept Expanding, Weight Computing
PDF Full Text Request
Related items