Font Size: a A A

The Field Of Term Extraction And The Relationship Between The Classification Study

Posted on:2010-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X QiuFull Text:PDF
GTID:2208330332978193Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology and domain-specific, old terms disappeared continually, and many new terms come force. The changes of terms reflect the change of domain to a certain extent. How to recognize them has become more and more important. The relation classification between domain words is conducive to find the depth information of domain. In this paper, aiming at Chinese and domain characteristics, it researches and discusses the domain corpus processing, candidate words extraction, domain terms identification, domain words relation classification. The main innovative achievements are as follows:(1)Combining Chinese and domain characteristics, it puts forward a new method of domain corpus processing. It processes the domain corpus using participle tool and rough cut. It provides more effective resources for domain terms extraction and identification.(2)Extract domain candidate words using mutual information and log-likelihood ratio. Candidate internal associative strength of characters string using mutual information and log-likelihood ratio, it makes sure that the string is a legitimate linguistic unit and constructs the candidate words collection. The two methods resolved the question which sparse data influents the accuracy of extraction result greatly. The experiment proved that this method has very good effect.(3)Domain term identification based on bootstrapping. First designates the domain seed words artificially, based on the seed words, combining T assessment method recognizes the domain terms in the candidate words collection. The experiment proved that this method has high accuracy.(4) Domain words relation classification based on SVM. This method mainly extracts the context characteristics between two domain words to realize relation classification. The experiment proved the method is feasible.(5) Base on the researches above, design and implement the prototype system of Yunnan tourism domain terms identification and relation classification.
Keywords/Search Tags:Terms, Domain terms, Relation classification
PDF Full Text Request
Related items