Font Size: a A A

Clean Up、Statistics And Experimental Analysis Of Large Scale Online Chinese Handwriting Recognition Dataset

Posted on:2013-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2248330374974997Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Handwritten character recognition to belong to large categories (or say much super class)pattern recognition problem, Chinese character recognition involves image processing,artificial intelligence, form language and automata, and other disciplines, is a comprehensivetechnology. Character recognition of the need to certain scale sample to study the core of thealgorithm for training, the sample data based data called (sample database), have Chinesecharacter sample library, to various recognition method are united, objective evaluation. Andthe experiment data has a direct influence on the recognition of the system performanceadvantages and disadvantages, so to build a large-scale have broadly representative databaseis not only handwritten character recognition in the premise and basis of research, but alsohelps to guide the recognition system is done more perfect. However, and handwrittenrecognition algorithm, the recognition of hand the development of database relatively slow, atpresent we can use Chinese character recognition of hand data is limited, and the database ofstorage format each are not identical, the early database there is a single type of sample, thesample quantity deficiency, sample lack no binding, etc. This article USES the south Chinauniversity of technology is HCII laboratory of large-scale development, online, no constraintof SCUT gPen database. The network the collected gPen database are sorting, statistics andthe data on finishing a lot of experiments, so as to improve the identification system ofadaptability, stability and recognition rate target. This paper mainly completed work, andsome innovations include:(1)Main completed the first phase of the database to SCUT gPen simplified databasefinishing work. This paper first introduces the gPen database source and characteristics, andaccording to the characteristics of the gPen database, has made the detailed, feasibleconsolidation strategy. Through the analysis and statistics of the workload, identify personnelquantity, of the workload distribution, time arrangement, tidy up the standards and inspectionacceptance, combined with data, the experiment. Specific steps are: first machine recognition,then classification (first word, the candidate top10,10of the word after candidate), divisionof labor for manual sorting, merger cleared up data, conduct comparative experiments. (2)GPen data to a lot of basic statistical work, analyzes the characteristics of samplegPen database, for example: the distribution characteristics of gPen sample database;Database of sample number distribution characteristics strokes; And on finishing appeared inseven different types of samples of the statistics, through different types of database ofsamples captured picture, more intuitive that the database of large-scale, diversity andunconstrained characteristics, and indicates that the complexity of the arrangement work andthe necessity of finishing work.(3)Through the gPen on finishing the database overall test, and separately from the fouraspects of comparative experiments. After specific experiment fully demonstrated gPendatabase has data sample, the number of users, rich and strong writing personalized (nobinding) and other characteristics, and the system recognition rate is vastly improved; Fullyexplain SCUT gPen database with a standard, good handwritten Chinese characters of theimportant characteristics of the database, and as an online database handwritten characterrecognition good experimental, test value.
Keywords/Search Tags:Online Chinese Handwriting Recognition, Chinese Dataset, Clean up, Statistics, Experimental Analysis
PDF Full Text Request
Related items