Font Size: a A A

Research On Chinese Word Segmentation Algorithm Based On Neural Network

Posted on:2007-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2178360182460737Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Chinese is written continuously as a whole sentence, and there is no space between words. It is easy to misconceive when understanding sentences. This brings great difficulty on the working of querying information. The difficulty is shown as that many irrelevant results are returned or no document can be found. Therefore, dividing the words of sentences exactly is needed in order to solve these problems.After deep research on all kinds of language phenomenon in daily communication, newspapers and magazines, the paper concludes the grammar phenomena included in typical different meanings which are common in daily life. It builds the part of speech code library supplied for part of speech encoding. Using Neural Network's self-organization and self-study divides different rules different meanings paragraph exactly on the basis of this. As to selecting samples, the samples space selected by the paper includes all typical kinds of different meanings paragraph basically. Before training samples, the words of paragraph are encoded separately by part of speech code library in order to transform the grammar rule included in paragraph for the data form which could be accepted by Neural Network. On the way of dividing expressing, the division point is judged according to output node value through a great deal of training when explaining the signification presented by output result. Therefor, the characters, words and abstract grammar rule are corresponding to input nerve cell by the way of code expressing, division mode is corresponding to output nerve cell. A conversion is found from input and output logic concept to input and output mode. Network achieves studying the grammar rule included in different meanings paragraph through an amount of data training, moreover, the exact words division is realized. In addition, BP algorithm is improved by adding rectangular quantity item to power value allowance to amend study speed. The convergence speed is enhanced. The word segmentation effect is improved obviously.After a lot of training through adopting three layers BP network, the experiment result shows that algorithm reach 93.13% training precision and 92.50% test precision on differentmeanings paragraph word division, and achieves prospective division effect on the processing of general material samples which have not been trained. This word segmentation method provide a new conversion way from input and output logic concept to input and output mode. It solves the difficult problem of being not able to train due to infinite words combination. It is applied in word division and acquires an good word segmentation effect.
Keywords/Search Tags:Chinese Word Segmentation, Natural Language Understanding, Different Meaning, Neural Network, BP Network
PDF Full Text Request
Related items