Chinese Bilateral Translation Between Simplified And Traditional-character Texts Based On Conversion Table And Context

Posted on:2016-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Pang

Full Text:PDF

GTID:2308330476953333

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Chinese Characters currently in use include simplified and traditional Chinese characters, mainland China and Singapore use simplified, Taiwan, Hong Kong and Macao and part of overseas Chinese use traditional Chinese characters. With the increasingly exchange of Chinese circle, these different characters brought a lot of obstacles. The existing technology for Simplified-Traditional conversion doesn t do very well. In order to solve this problem, this thesis proposes a method based on the conversion table and the context. The research work of the author of this thesis before achieved 95.6% conversion rate in the accurate evaluation. On the basis of work before, the thesis takes more in-depth study of one simplified Chinese character to many traditional Chinese characters conversion.This problem can be viewed as a classification problem. The thesis proposes to use a combination of statistical models plus rules on this conversion problem. The used statistical models include SVM(Support Vector Machine), the Maximum Entropy Model and Bayes Model. To optimize the classification results, the author first proposed a new text feature selection method called ADMMR, the result of this feature selection method can achieve the same ratio as the expectations of Cross-Entropy and Chi-Square Test, and the experiment shows that they are very good representation for text. With the premise of using the same classification model, ADMMR is better than Information Gain method about 4%; The thesis also proposes the use of Maximum Entropy Models with tf-idf, instead of using the value of 0-1, and the experiment shows that the using tf-idf can be better than the 0-1 method about 2%; the author proposes using ADMMR, expected Cross-Entropy and the Chi-Square Test of the text as a feature selection method, using tf-idf to quantify each feature, and then use SVM and Maximum Entropy Model learning the training data, which will form 6 classification models, and then use a Bayes Learning Model to learn the training data and get the 7th classification model; 6 models before do a voting choices and which class gets the most votes will be the classification categories, if there were two or more classes got the most votes, then using a Bayesian model as an assisted identification, experimental results show that the comparing to individual SVM, the classification results of Maximum Entropy models or Bayesian models combined models is better and more stable.Using the combination of statistical models plus rules to solves simplified-traditional Chinese characters conversion problem: rules are converting simplified characters to the corresponding traditional characters according to the thesaurus; and for the 3% words which can t became a phrase with other word we can use the combined models; experiments show that the method can achieve 98.5% accuracy rate, a better solution to the conversion problems.

Keywords/Search Tags:

simplified and traditional characters conversion, simplified and traditional one to many transformations, combination model, maximum entropy, SVM, GIS, ADMMR, feature selection

PDF Full Text Request

Related items

1	The Design And Implementation Of Plugin For Web Simplified Andtraditional Forms Of Chinese Characters Conversion
2	Research On The X3D Model-Based Simplified Algorithm
3	Image Reconstruction Algorithm Based On Simplified Numerical Optimization Back Propagation Neural Network
4	The Key Parameters Of Simplified Spice Model For Small Size Mosfets
5	The Key Parameters Of Simplified SPICE Model For Small Size MOSFETs
6	Research On Simplified Algorithm Of Three-Dimensional Grid Model Oriented To Mobile Terminal
7	Maximum Entropy Method And Its Applications In Natrual Language Processing
8	The Estimation Of Oil Thickness Based On Simplified Hapke Model
9	Research On WSN Protocol Model Of MAC Based On Simplified Event Evaluation Model And Cooperative Communication
10	Multi-model Simplification Based On Effect Maintenance