Font Size: a A A

Chinese Words Segmentation Based On Context And Stopwords

Posted on:2011-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z JiangFull Text:PDF
GTID:2178360308972940Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of national information technology and the popularization of Internet, Chinese information processing becomes a hot research field. Therefore, the requirements of Chinese word segmentation techniques become more sophisticated as a front for Chinese information processing.The accuracy decrease in word segmentation caused by Out-of-vocabulary (OOV) words is more serious than by ambiguous words. Therefore, effective recognition of OOV words will largely improve the effectiveness of Chinese word segmentation. Stopwords affect the recognition of OOV words, but a reasonable use of stopwords can improve the quality of segmentation. The work of this dissertation aims at the problems mentioned above. The main contribution of this dissertation is as follows:(1) A context-based Chinese word segmentation model is proposed. Most of the previous segmentation algorithms only consider the corpus of information or context information, which will produce local probability of bias. This model that considering corpus information and context information can improves the quality of segmentation.(2) Recognition of OOV word based on stopwords (ROWS) is proposed. The most of approach is based on the concept with stopwords interference with recognition of OOV word. Only a few rules use post-processing, which is little effect. Based on the corpus information and context information, while modeling of the stop words and reducing the local probability of bias, ROWS improve the quality of identification of unknown words.
Keywords/Search Tags:Chinese Information Processing, Chinese word segmentation, Out-of-vocabulary (OOV) word recognition, Stopwords, Context
PDF Full Text Request
Related items