Application Of The Clustering Analysis In The Large Vocabulary Chinese Character Recognition

Posted on:2008-09-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2178360245475360

Subject:Communications and information processing

Abstract/Summary:

With the fast development in science, the need of analysis and management a tremendous amount of data becomes more and more important. Clustering analysis is introduced to find out the model from large data. Clustering analysis has been widely used in data mining, pattern recognition, image progressing and so on. This thesis mainly studies the application of Clustering technologies for recognition of large databases.At first, we summarize Clustering's principle, construction and the basic idea in detail. Many Clustering algorithms have been investigated, they are classified into several types: partitioning cluster, hierarchy cluster, density-based cluster and model-based cluster. They have their own advantages and disadvantages respectively; moreover, each type has been improved from different parts by different researchers. In Chapter 3 we study three classic Clustering algorithms: K-means, LVQ, kernel Clustering, meanwhile experiment MLVQ (the improve LVQ algorithm). At last we select K-means algorithm for large HCC recognition. Two kinds of feature exact algorithm were used in the experiments: Gabor feature and Gradient feature, experiments show that Gradient feature is better than Gabor feature in recognition accuracy, and the recognition accuracy can be enhanced furtherly by LDA algorithm..Among our researching, we find that the clustering codebook after clustering needs a lot of memory and the recognition time is also very long. All of these are disadvantages for population of large data recognition in real world. So we employ Split VQ algorithm and two-layer clustering algorithm to increase the defectiveness of recognition in time and space. It has been shown that these two algorithms not only guarantee the recognition accuracy, but also can reduce the recognition time and memory of codebook greatly.Conventional k-means needs to know the exact cluster number before performing data clustering. Otherwise, it may lead to a poor clustering performance. The Rival Penalized Competitive Learning algorithm (RPCL) can automatically select the correct cluster number, but it is sensitive to the learning rate and the de-learning rate, especially the de-leaning rate. Chapter 5 presents an improved RPCL algorithm, which is based on the evaluation of competition ability between the winner and the rival, the improved RPCL algorithm could determining clustering number without the selecting of de-learning rate. Our experiments have shown that this improved algorithm can find out the correct clustering more quickly and convenient than RPCL algorithm...

Keywords/Search Tags:

Clustering Analysis, Character Recognition, K-means Algorithm, Feature Extract

Related items

1	The Algorithm Of Clustering Based RBF-LBF NN And Its Application
2	Video Character Recognition Technology Research And Application
3	Vehicle License Plate Recognition System, Key Technology Research
4	Application Of The Clustering Analysis In Handwritten Chinese Character Recognition
5	Application Research On BWS-SOM Model In Chinese Recognition In Large Character Set
6	Research On The Improvement Of C-means Clustering Algorithm
7	Research On Feature Extraction And Matching Recognition Of Printed Chinese Character Recognition System
8	Research On Text Clustering Algorithm Based On K - Means
9	The Research Of K-means Clustering Algorithm Improvement
10	Analysis And Research Of Handwritten Character Clustering Based On Affinity Propagation Clustering Algorithm