Font Size: a A A

Ccd-based Terminology Extraction Study

Posted on:2008-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:G C DuanFull Text:PDF
GTID:2208360215460483Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic term extraction (ATE) is one of the most important tasks in computational terminology. Its primary goal is to identify a set of text units, e.g. words, from a collection of texts from a specific subject field that represent the key concepts in that field. Term extraction is fundamental issue in natural language information processing. It has been applied in many other fields of natural language processing, such as Natural Language Generation, Computational Lexicography, Parsing, Corpus Linguistic Research, Statistical Machine Translation, Information Retrieval, Text Classification, Text Summarization and so on.This paper first introduces the definition and characteristics of term, then lists some methods of automatic term extraction, also lists theirs positive aspects and negative aspects. We propose an approach for Chinese single word term extraction combining the dictionary-based method with seed knowledge-based method based on the characteristics of term's linguistic and statistical attributes. We also report our job on purification on Chinese Concept Dictionary. Then we propose a method for bigram term extraction in order to overcome some difficult points in term extraction. At last we report our term extraction platform that is a system for term recognition by field experts.The main jobs of this paper are as follows:①We introduce the definition and characteristics of term and introduce the linguistic resources such as Chinese Concept Dictionary, Law Grocery, Bilingual Law Information Corpus.②We introduce the detail of Chinese Concept Dictionary and its problems, then propose a method for purifying CCD automatically.③We propose an approach for Chinese single word term extraction combining the dictionary-based method with seed knowledge-based method based on the semantic relationship. Enrich the Law Grocery and prepare for the extracting bigram terms.④We propose a bigram term extraction approach combining the linguistic characteristics of term and traditional statistical method. At the same time, it is the fundamental base for multiword term extraction.As is shown in the experiments, the idea and method in this paper is effective and feasible. We are designing the program for accelerating the job of term extraction. The methods and conclusions in this paper can also provide good references for research on automatic term extraction.
Keywords/Search Tags:term extraction, Chinese Concept Dictionary, law term, term extraction plat
PDF Full Text Request
Related items