Font Size: a A A

Research On The Sense Guessing Of Chinese Unknown Words

Posted on:2016-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:F F ShangFull Text:PDF
GTID:2308330464964465Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantically understanding words is an essential issue in many Natural Language Processing (NLP) tasks, especially in understanding texts since it is the basic techniques of understanding texts. However, there are plenty of unknown words which are not summarized in any syntactic and semantic categories. Therefore, it is difficult to make the users understand the content of the texts. To tackle such issue, in this academic paper, I focus on the sense guessing of Chinese unkonwn words.It is an important research task for the improvement on sense guessing of Chinese unknown words since it has many NLP applications, such as machine translation, information retrieval and semantic analysis, etc. In this academic paper, under the support of the National Natural Science Foundation, "Research on Key Technology of Chinese All-word Sense Tagging", I surveyed on Chinese unknown words from several aspects. The main contributions of this academic paper are listed as follows:1. The sense guessing of Chinese unknown words based on "synonyms Cilin extended version." I constructed three models, overlapping model, word-category association model and rule-based modes. The main tasks in processing on model construction are:pretreatment corpus, selecting test sets and development sets, calculating statistical information of words required by models. To analyze the unknown words which are not covered by rule-based model, I add new rules to the rule-based model to improve the forecasting performance of the rule-based model, according to affixes and quasiaffix knowledge.2. The sense guessing of Chinese unknown words based on "Semantic Knowledge-base of Contemporary Chinese". The research tasks conducted by the dicitionary of this academic paper includes the following aspects:(1) According to the tree structure of the dictionary, I constructed semantic dictionary based on different semantic levels; (2) Based on different levels of semantic dictionary words, I introduced three models:overlapping model, the word-category association model, the rule-based model; (3) From the dictionary, I extracted words as unknown words, and then used each model to predict sense based on the dictionary. (4) In this academic paper, the rule-based model has higher precision of sense guessing. I integrated rule-based model and other semantic models to predict the unknown word and obtain better prediction performance.3. The annotation for the corpus of People’s Daily in 2000. Few of existing corpus resources included semantic annotation of unknown words. In this academic paper, I did work of semantic prediction and annotation of the unknown words in People’s Daily which published in 2000 based on the overlapping-word model, the word-category association model and the rule-based model. The main works of corpus annotations include:extracting and statisticing unknown words from the corpus; using the model of semantic prediction based on "synonyms Cilin extended version" and different levels of "the Semantic Knowledge-base of Contemporary Chinese" to obtain semantic prediction; analyzing the results and using the integrated model to predict semantic and annotate to the corpus. Finally, I can get the corpus resources with the sense annotation of unknown words.
Keywords/Search Tags:Chinese Unknown Words, Sense Guessing of Chinese Unknown Words, Semantic Annotation, Affix, Integration
PDF Full Text Request
Related items