Study On Recognition Of Chinese Proper Noun

Posted on:2007-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:T T Mao

Full Text:PDF

GTID:2178360212957107

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Chinese proper noun recognition is an important technique to improve the accuracy of segmentation. The main task of this paper is studying and implementing the effective approach of extracting proper noun from Chinese texts.Based on the research and analysis of current identification methods for Chinese proper noun, this paper sets up a model based on support vector machine(SVM) to identify Chinese proper noun, and presents four different methods to improve the performance of SVMs, the first is the corresponding algorithm combining SVM with statistical method, the second is modified SVM and K nearest neighbors(KNN) algorithm, the third is modified SVM algorithm, the fourth is cluster SVM algorithm.Analyzing the classification results obtained by sole SVM, the misclassified testing samples by SVM are mostly near the decision plane. In order to increase the accuracy of SVM, a hybrid model combining SVM with a statistical approach for Chinese proper noun is proposed, which is, in the region near the decision plane, statistical method is used to classify the samples instead of SVM, and in the region far away from the decision plane, SVM is used.A modified SVM-KNN classifier combined SVM with modified KNN is presented in the same way. Different classifiers are used for classifying the different test samples in spatial distributions. To fit the unbalanced data, a modified KNN classifier is proposed to modify classic KNN.Because of the unbalance of the training set (the negative samples are significantly outnumbered by the positive ones), which worsens the performance of SVM, a modified SVM classifier to identify Chinese proper noun is proposed. A algorithm called boundary movement is used to modify SVM.Cluster SVM algorithm is also proposed in order to reduce classification mistakes caused by the unbalance of the number of two kinds of samples in training set. In this algorithm, the training set was clustered using the kernel-based K-means clustering, thus a machine learning model is set up using SVM algorithm to the training set that has been clustered.In this paper, firstly, according to the characteristics of Chinese proper noun, words in the texts were segmented and assigned part-of-speech(POS) tags, a training set is constructed by extracting features of vectors. Secondly, four Chinese proper noun recognizing models are set up based on the above four methods. Lastly, the final identification results of the testing...

Keywords/Search Tags:

Chinese Proper Noun, Statistical Method, Modified SVM-KNN, Modified SVM, Clustering

PDF Full Text Request

Related items

1	Proper Noun Recognition With Transformation-based Learning
2	The Name Of The Automatic Identification Of Chinese Institutions
3	Fabrication And Gas Sensing Performance Of WO₃/MoO₃ Modified NiO-based Gas Sensors
4	Nonlinear Profile Monitoring Using B-spline And Modified Clustering Analysis
5	Chinese Network Spread Of Genetically Modified Food Problems And Countermeasures
6	Preparation, Characterization, And Adsorption Properties Of SiO₂ And Magnetic Particles Modified With Organosilanes
7	Positron Emission Tomography Statistical Iterative Method And The Accelerated Method
8	Glucose Biosensor Based On Carbon Nanotubes Modified Of Study
9	Research On Photoelectric Properties Of Modified Ge Materials
10	Natural Language Understanding Of Chinese Classifiers And Noun Collocations And Troubleshooting Systems