Font Size: a A A

Research For Chinese Term Extraction In The Military Domain

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J W TianFull Text:PDF
GTID:2248330398450010Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The term is the basic language unit describing the domain knowledge, and generally composes of words or phrases. The process of automatically obtaining the domain-specific terms in texts is called term extraction. Term extraction is an important technology and has a wide range of applications in the field of natural language processing, text mining, ontology building, lexicographers and machine translation and so on. Automatic term extraction plays an important role in understanding and mastering the development of domain knowledge.The military term is a special domain term, and the automatic extraction of military terms can not only acquire and expand the military domain knowledge, but also significantly reduce the cost of manual collection and information processing, which allows us to focus on the deep analysis of intelligence. So the extraction of military terms is of great importance in the national defense and military field.This paper focuses on the Chinese term extraction in the military domain and gives a detailed description of the composition and the characteristics of military terms considering the practical situation. After several common statistical machine learning models are analyzed and compared, a widely used conditional random fields (CRF) model is employed in Chinese term extraction in the military domain, resulting in the precision, the recall and the F-score of respectively72.83%,71.81%,72.05%. After that, in order to reduce the dependence of the CRF-based method on the corpus scale and annotators, an unsupervised method oriented statistics is tried to extract military terms, and experiments about the information entropy, the mutual information and the C-value statistics are carried out and the performance of the three statistics systems is analyzed, but the experimental F-score is only20.68%. At last, comparison experiments show that each of the two methods has its own advantage, and the CRF-based term extraction system is simpler and more feasible, and leads to a better result.
Keywords/Search Tags:Domain Terms, Term Extraction, Conditional Random Fields, MilitaryDomain, Feature Template
PDF Full Text Request
Related items