Research And Implement Of Chinese Word Segment Techniques Based On The Conditional Random Field

Posted on:2019-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:X F Xu

Full Text:PDF

GTID:2428330566499245

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence,the robot gradually enters people's daily life.In the process of human-computer interaction,natural language processing is widely used.Chinese word segmentation,as the basic technology of natural language processing,is also one of the hot spots in the field of artificial intelligence.The current Chinese word segmentation algorithm for specific areas has poor performance,which leads to wrong semantic understanding.This paper proposes an improved algorithm for the conditional random field(CRF)model,which improves the precision and recall rate of Chinese word segmentation.First of all,this paper introduces three mainstream word segmentation method.On the basis of comparing their respective advantages and disadvantages,CRF is selected as the word segmentation model of this paper.Aiming at the technical difficulties in the research of Chinese word segmentation,the overall flow of word segmentation system is designed.Secondly,aiming at the problem of lack of part of speech in the preprocessing of word segmentation,this paper proposes a part-of-speech and lexeme lable set(PLLS),and introduces parameters to mark the part of speech.Aiming at the CRF,an improved feature template is proposed.While extracting common features,compound unary feature information is added to improve the recognition ability of out of vocabulary(OOV).Then,the stochastic gradient descent(SGD)method is applied to the training process of CRF,and a method based on feature frequency is proposed to improve the convergence speed of model training.Aiming at the application of model prediction algorithm to PLLS,an improved Viterbi algorithm is proposed in this paper.In the subsequent processing section,reverse maximum matching(RMM)algorithm based on Tire Tree is used to discover ambiguous words.For the ambiguous words found,three disambiguation methods are proposed.Finally,a Chinese word segmentation system is designed by using JAVA language.According to the practical application scene,the corpus for government affairs is collected,constructed and tested,and the test results are analyzed.In comparison with the mainstream segmentation tools,the validity and practicability of the system are verified.

Keywords/Search Tags:

Chinese Word Segmentation, Conditional Random Field, Stochastic Gradient Descent method, Viterbi algorithm, Out Of Vocabulary Words, Ambiguous Words

PDF Full Text Request

Related items

1	Research And Implementation Of Chinese Segmentation System Based On Conditional Random Fields Model
2	Optimization And Implementation Of Chinese Lexical Analysis Algorithm For Chat Robot
3	Research On Chinese Word Segmentation Algorithm Based On News Text
4	Chinese Word Segmentation Method Based On Dictionary And Statistics Of The Words
5	Research On Words Segmentation Algorithm And Word Variant Extraction Method Of Message Variety Based
6	Research And Application Of Chinese Word Segmentation Method Based On Conditional Random Field
7	The Research And Implementation Of The System For Chinese Word Segmentation Base On Dictionary And Statistic
8	Chinese Multi-category Product Words Segmentation And Recognition Based On Electronic Commerce
9	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
10	Research On Web Text Segmentation Based On Conditional Random Fields