Research Of Chinese Word Segmentation With Conditional Random Fields

Posted on:2009-12-24

Degree:Master

Type:Thesis

Country:China

Candidate:Q Z Shen

Full Text:PDF

GTID:2178360245463706

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

During the last decade, Natural Language Processing (NLP) has become a hot research field. Due to special characteristics of the Chinese language, Chinese word segmentation plays a critical role in many Chinese NLP applications and has become a bottleneck in Chinese Information Processing.Conditional Random Fields (CRFs) is not only a conditioned probabilistic model for labeling and segmenting sequential data, but also an undirected graph model that calculates the conditional probability over output nodes given the input nodes. It relaxes the strong independence assumptions of a generative model (e.g. Hidden Markov Model) and overcomes the label-bias problem exhibited by the Maximum Entropy Markov Model and other discriminative models. CRFs can easily incorporate arbitrary features of the input sequence and introduce some other information, such as the rules of word's formation.This paper proposes a CRFs-based Chinese word segmentation system with focus on the importance of parameter selection and different tagging strategies. Within the infrastructure of CRFs, we also explore some new features, such as the word formation power of a character. Evaluation on the SIGHAN PKU benchmark corpus shows that the new features significantly improve the F1 score by 3.5%. It also shows that our system achieves 94.5% in F1. This suggests that CRFs works well and holds great potential in Chinese word segmentation. In addition, we also explore the effect of integrating different models, including CRFs, HMM and MEMM. Evaluation on the SIGHAN PKU benchmark corpus shows that these models are quite complementary and the integrated system achieves 95.6% in F1, which much outperforms the state-of-the-art systems.

Keywords/Search Tags:

Natural Language Processing, Chinese Word Segmentation, Conditional Random Fields, Word Formation Power, Model Integration

PDF Full Text Request

Related items

1	Research And Application Of Chinese Word Segmentation Based On Conditional Random Fields
2	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
3	Research And Implementation Of Chinese Segmentation System Based On Conditional Random Fields Model
4	Research On Chinese Word Segmentation Based On Deep Learning
5	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
6	The Research On Chinese Word Segmentation Based On Conditional Random Fields In Big Data Environment
7	Research Of Named Entity Recognition Based On Conditional Random Fields
8	Study On Chinese Word Segmentation Based On Recurrent Neural Network Language Model
9	The Research Of Chinese Word Segmentation Based On CRF
10	Research Of Chinese Word Segmentation With Conditional Random Fields And Implementation