Font Size: a A A

The Sense Guessing Of Chinese Unknown Words Research And Implementation For The Full Text Annotation

Posted on:2018-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:W L BaiFull Text:PDF
GTID:2348330518491127Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese unknown words are these words which are not included in the dictionary but appear in the corpus. With the development of the times and the popularity of the Internet, a large number of new words are emerging, and the overall scale of Chinese unknown words is increasing rapidly year by year. As time changed, Chinese unknown words appear. It brings great challenges to the many tasks in the field of Natural Language Processing. For this Chinese unknown words processing tasks should attract enough attention.For now, Chinese unknown words processing tasks mainly includes three parts:recognition of Chinese unknown words, speech tags, and semantic prediction. There are many researches on recognition and speech tags prediction of Chinese unknown words but relatively few studies is known about semantic prediction of Chinese unknown words. The thesis mainly considers the semantic prediction of Chinese unknown words. We summarize the former research achievement about the semantic prediction of Chinese unknown words. It is found that the semantic prediction of Chinese unknown words is mainly based on two features namely internal feature and external feature of Chinese unknown words. Among them, internal feature of Chinese unknown words refers to primarily compositions and part of speech of Chinese unknown words. External feature refers to primarily the contexts of Chinese unknown words in the corpus.The research of this thesis rooted in the previous researches,and the Chinese unknown words semantic prediction models in the previous researches are basedmainly on internal feature of Chinese unknown words. However, the related studies have shown that external feature of Chinese unknown words are also valuable to semantic prediction. This paper will combine the internal feature and external feature, designing two semantic prediction models that are Chinese unknown words semantic prediction model based on term vectors and Chinese unknown words semantic prediction model based on Baidu Encyclopedia respectively, applied to the research in this paper. Through comparison experiments, we analyze the merits and demerits of that integrated model in the previous researches and the above two models. By integrating the advantage of each model, we design a cascade model applied to Chinese unkown words semantic prediction. Experimental results demonstrate the validity of the approach.The research aims to improve the performance of integrated model in the previous researches semantic prediction, and finish the semantic annotations task of Chinese unknown words in the corpus of People's Daily in 2000 on this basis.
Keywords/Search Tags:Chinese Unknown Words, Sense Guessing of Chinese Unknown Words, Internal Feature, External Feature, Cascade Model, Semantic Annotation
PDF Full Text Request
Related items