Font Size: a A A

A Method For POS Guessing Of Chinese Unknown Words Based On Combination Model

Posted on:2012-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WenFull Text:PDF
GTID:2218330338456029Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In today's information age, with the rapid development of society, economic and culture, a lot of Chinese unknown words generated. These Chinese unknown words enriched language, at the same time, they bring unprecedented challenges for Chinese information processing, because Chinese unknown words have no uniform, standard definitions, and when using they are flexible. Now, there are all kinds of information resources in real life, it is obviously impossible to compile the dictionary by hand, and if let the computer to identify unknown words automatically and accurately, it inevitably involves POS (part of speech) automatic judgments of Chinese unknown words. This thesis focus on the POS guessing of Chinese unknown words that computer assigns a suitable POS to Chinese unknown words, or it is a process that we tag each POS of Chinese unknown words with noun, verb adjective, or other POS.The POS guessing of Chinese unknown words is the key field of Chinese information processing technology, is also the "bottleneck" problem and hot research issue. At present, the Chinese information processing has attracted many linguistics and computer experts who put a lot of research on it, for POS guessing of Chinese unknown words studying was relatively small. We can see from the book and paper, the existing methods had their own characteristics, but overall they were inadequate. The POS guessing effect is not very good, and has considerable research potential and space.This thesis is based on some previous research, we proposed a guessing of Chinese unknown words method which based on a combination model. This method taken the internal features and external features of Chinese unknown words into account, which included three models:the first model (machine-learning method, it used the internal features of words to guess POS), then test the reliability of the result of POS guessing of unknown words, for the unknown words had low credibility, according to the combination algorithm above, we used the second model(POS guessing model which based on the context, which investigated the external characteristics of the words: neighboring words'context information) and the third model (POS guessing model which based on characters position of words, which investigated the internal features of words, the role of character on each position of words) to guess their POS. The purpose is to improve the POS guessing accuracy of unknown words though exert three single models'advantages, and reduce the influence that Chinese unknown words bring to Chinese word processing, and get segmentation effect.Though the experiment, the combination method achieved 94.92% accuracy, and indicated that the application of this method can make the higher POS guessing accuracy of Chinese unknown words than existing methods.
Keywords/Search Tags:Chinese Unknown Words, POS, Context, Internal Features, External Features
PDF Full Text Request
Related items