Font Size: a A A

Study Of Automatic Segmentation Technique Based On Conditional Random Fields

Posted on:2006-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:2168360155958176Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In company with the development of technology and the expansion of mass information, Information Processing Techniques have been one of the most important parts in technology developing in today's world. To extract useful knowledge from the mass information, it must be possible to make machines "understand" the information formed by human languages. However, words are the least language elements which can be independently used and have real meaning. It is the first step to understand the natural language that to identify the words, only by achieved the first step, could it be possible to deal with the information in depth, even make the machines understand human languages. The researches of machine translation and natural language processing in our lab mostly depend on the technique of sequence labeling and segmenting, such as segmentation, so as to reduce the extension caused by errors , and to do more deep research.Conditional Random Fields (CRFs), a recently introduced conditioned probabilistic model for labeling and segmenting sequential data, is a undirected graph model that calculate the conditional probability over output nodes given the input nodes. It relaxes the strong independence assumptions which generative model must have, such as Hidden Markov Model, and overcomes the label-bias problem exhibited by Maximum Entropy Markov Model and other non-generative models. This model can easily incorporate arbitrary features of the input sequence and the implicit ones of the language in itself, and so we can not only introduce the transition and emission features in traditional HMM modeling, also introduce some other information, such as the rules of words' formation, domain features, lexicon etc.This text systematically introduces the definition of CRFs, structure of the CRFs model, feature functions, parameter estimate and training methods. Applying CRFs to Chinese automatic segmentation, we obtained a better performance in comparison with the model already used in sequence labeling and segmenting, and verified the advantages of the CRFs model in sequence labeling and segmenting by experiments;...
Keywords/Search Tags:Conditional Random Fields, Automatic Segmentation, Natural Language Understand, Directed graph, Undirected graph, Hidden Markov Model, Maximum Entropy Markov Model, Parameter estimate
PDF Full Text Request
Related items