| Offline handwritten Chinese text recognition(HCTR)has been a long-standing research topic in the community with challenges of large character set(more than 7,000 classes),the diversity of writing styles and character-touching problem.It is natural to develop data preprocessing and augmentation techniques to build pratical and robust offline HCTR systems.With the rising of deep learnig,researchers begin to use deep learning based methods to solve offline HCTR problems in stead of traditional segmentation-based methods.But deep learning based methods have too fundamental issues: needs of large data and computation resources.To solve these problems of offline HCTR and deep learning based methods,our work and main contributions can be summarized as following.1.We proposed a data preprocessing and augmentation method,and a novel CNNResLSTM character model.The former is to solve writing style diversity and slant text line problems.And by randomly generating training samples,shuffling characters in training samples,mix-training with synthesized text samples,we can train the character model adequately.In the post-process module,we adopted language model to decode and correct the final result.As shown in experiments,the proposed method achieved the state of the art result among the method published,and showed robustness to different writing styles and habits.2.The above CNN-ResLSTM model has large computation and storage comsumptions and is not convenient to deploy on mobile devices,so we apply acceleration and compression methods to it.For convolution layers,LSTM layers and inner product layers in the model,we first adopted Tucker decomposition and SVD decomposition to accelerate the model,and it also did some compression.Further,we used the proposed Adaptive Drop Weight(ADW)method to compress the decomposed model.Finally,we compressed the CNN-ResLSTM model for 21.8 times and accelerated it for 2.2 times practically.The compressed model can be deployed on mobile devices.And it is noteworthy that our compressed model still remains the best results compared to the methods published. |