Font Size: a A A

The Research And Construction Of A Chinese Semantic Corpus

Posted on:2007-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:R XuFull Text:PDF
GTID:2178360185478214Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
After a long period of development, Natural language processing (NLP) research faces the bottleneck of semantic knowledge acquisition; hence the construction of the semantic resources is an important topic in NLP. The thesis is address the construction of a semantic corpus, the approach is to take the corpus of 1998 China Daily as the sense-tagging object and HowNet as the semantic knowledge source. L2bank, a semantic corpus with the HowNet tags, is constructed with the proposed methods.Aiming at the construction of the corpus with the sense-tagging technique, the thesis consists mainly of the following work:After reviewing the construction- of the semantic corpus and related semantic resources, the thesis takes HowNet as the semantic source, and the China Daily corpus as the tagging object. Oracle 9 iFS is chosen as the hardware platform, therefore a general design and construction schema are proposed.Secondly, the thesis discusses the sense-tagging techniques, and brings forward a computing model of semantic relevancy based on HowNet. The model is utilized in the semantic tagging of polysemous words, which has been a bottleneck in semantic knowledge acquisition. The experimental results show that the accuracy of the sense disambiguation is 80%, and this method has largely reduced the manual work during the corpus construction.Thirdly, the paper describes the construction of the L2bank corpus in more details. According to the characteristics of the corpus, 42 APIs are designed and implemented. The evaluation and analysis of this corpus is made on a large scale.Semantic corpus construction can make it possible for further progress in NLP. In principle, the thesis has implemented a general architecture of a...
Keywords/Search Tags:Chinese Language Processing, semantic corpus, HowNet, semantic relevancy
PDF Full Text Request
Related items