Font Size: a A A

The Study Of Automatic Chinese Term Extraction

Posted on:2007-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360182989258Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic Term Extraction is a fundamental issue in Chinese Information Processing. It has been applied in many other fields of Natural Language Processing, such as Natural Language Generation, Computational Lexicography, Parsing, Corpus Linguistic Research, Statistical Machine Translation, Information Retrieval, Text Classification, Text Summarization and so on. For the open corpora, term extraction seems more significant.The main difficulty of Chinese Term Extraction is that it is hard to apply algorithms which are well used in simple words extraction to multi-words term extraction. Based on the analysis of the statistical and linguistic characteristic of term , this thesis makes a conclusion on term features: Unithood and Termhood, and classifies Chinese term into two kinds according to their composition: simple term and complex term. We do a further investigation for Automatic Chinese Term Extraction by comparing and analyzing various models of term extraction.First, set up a model of the decomposition of complex term. According the research on Chinese term characteristic and on the base of other researchers' study, we study the relationship with simple term and complex term and propose a decomposition model of complex term.Second, design parameter F-MI and formulas used in Chinese Term Extraction by taking the advantages of parameter C-value and mutural information. The parameter is designed according to the term composition.Third, design and implement an Automatic Term Extraction system for open corpus.Fourth, propose an idea to build incorrect term group, which will not only help to avoid the affects of incorrect terms, and reduce the term evaluation, but also revise the algorithm of term extraction based on the further research of incorrect terms and improve the extraction precision.Finally, we test the algorithm proposed in the thesis and the traditional algorithm using quantities of web pages downloaded from internet. Our method's precision is up to 73.2%. It's better than traditional methods, which indicates the efficiency and the feasibility of the methods proposed in the thesis.
Keywords/Search Tags:Automatic Chinese Term Extraction, Term Characteristic, Simple Term, Complex Term, Decomposition of Complex Term
PDF Full Text Request
Related items