Chinese Word Segmentation System Based On Statistics

Posted on:2011-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:X L Li

Full Text:PDF

GTID:2178360305988621

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, Chinese information processing has been developed significantly in various computer fields, however, the Chinese word segmentation is the foundation of Chinese information processing, as the term is used to connect sentences and information processing platform in the middle part, consequently, Chinese word processing result of a direct impact on the accuracy of Chinese information processing, becoming the bottleneck to Chinese information processing platform processing capability.The thesis is a review on the automatic Chinese word processing status, principles, processes, evaluation indicators, and domestic and abroad development status. According to various Word segmentation algorithm I did a lot of deep study and research and made some suggestion for the improvement after the analysis of the current word segmentation algorithm's advantages and disadvantages. The use of support vector machines with the vector space model for the establishment of a new CWSSBS. Owing to the support vector machine with a limited training samples can be established under the terms of a complex sub-model and to achieve a strong self-learning ability. And the use of inverted dictionary to ensure that commonly used and the latest new words are not logged at the highest priority status. Therefore, the ability of the improved CWSSBS Chinese word the system automatically logged to learn new words has been effectively improved. In the support vector machine under the influence of self-learning function of a dictionary, therefore, it can make the system has a high adaptability as well as the unfamiliar environment and strong portability. And in manual and machine monitoring mechanism of the intervention can be timely and correct errors in auto-learning. In the ambiguity processing section, there is an improved matching and reverse matching the positive combination of ambiguity acquisition method. In the ambiguity treatment process, using the longest word into the field of law to ensure the handling of ambiguous, reaching its maximum extent in the purpose of eliminating ambiguities.Through the simulation analysis of the system results, we can see the the improved WSSBS compares with the original system has been improved a lot in the ambiguity problem-solving and self-learning function of the dictionary. However, due to the time and environmental conditions, it needs further research and improvement in the future.

Keywords/Search Tags:

statistical word segmentation, support vector machines, intelligent learning

PDF Full Text Request

Related items

1	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
2	Medical Image Segmentation Based On Support Vector Machines
3	Research On Some Problems And Applications In Support Vector Machines
4	Image Segmentation And Object Classification Based On Support Vector Machines
5	Research On Improved Support Vector Machines Algorithm And Its Application In Image Segmentation
6	Study Of Support Vector Machines Algorithm Based On Statistical Learning Theory
7	Studies And Application Of Fuzzy And Double Regular Support Vector Machines
8	Triplet Support Vector Machines For Pattern Classification
9	The Models Of SVM And The Applications Of SVM To Image Segmentation
10	Support Vector Machine-based Probability Density Estimation