Font Size: a A A

The Research Of Recognition Methods In Handwritten Chinese Short Answer Question Based On Data Augmentation

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q ShenFull Text:PDF
GTID:2428330605475965Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic marking is an information technology with high social production value.At present,using answer card to collects and identifies multiple choice questions is the most popular technique.There is still no effective method for short answer questions in Chinese.Considering the performance of low accuracy and difficulty in recognizing handwritten Chinese text,this article proposed some solutions to the recognition of handwritten Chinese short answer questions,based on data augmentation and convolutional neural networkTo the problem of segmenting handwritten Chinese characters,a multi-step character segmentation method is proposed.Using a polynomial curve to fits the vertical projection histogram of the image,and making a dividing line at the position of the minimum point of the function.Then,according to the results,the width-height ratio of the image segment is calculated to distinguish the character types of different structures,including over-segmented characters and sticky characters.Finally,the threshold merge algorithm and drop-falling algorithm are used for processing,which effectively improved the segmentation accuracyThe original set recognition method is proposed to adapt to the sample distribution in the test paper environment.By collecting samples of past answer papers,original set is established and used as a training set to adapt to the sample distribution in the test paper environment,which can improve model recognition performance.A data augmentation method for small-scale handwritten Chinese character data set is proposed.Based on the impact of the training data and the balance in sample classes on the model performance,10 kinds of data augmentation algorithm are introduced to perform sample expansion and sample number balance on the native set,which increased sample diversity and reduced overfitting problems.In order to improve the poor performance of traditional data augmentation method and DCGAN,a combining method called X-DCGAN is proposed.X-DCGAN makes full use of the advantages of traditional augmentation methods and generative methods,thus it can effectively complete the task of expanding and enhancing of small-scale dataset.
Keywords/Search Tags:automatic marking, character segmentation, handwritten Chinese character recognition, convolutional neural network, data augmentation
PDF Full Text Request
Related items