| Optical character recognition(OCR)is widely used in ID card,driver’s license,business card and other card recognition.Compared with ID card and other fixed documents,business card has many styles,and it is more difficult to extract and identify information.Most of the traditional OCR algorithms rely on artificial design features,and the template matching method is poor in generalization ability,and the effect is not good when dealing with the task of business card recognition.Natural language processing(NLP)is good at dealing with semantic understanding in different contexts,and has strong robustness and applicability.This paper proposes a business card recognition method using NLP domain text classification technology as OCR back-end.It makes an in-depth research on the fusion of word level and word level text features,and explores the application of neural architecture search technology and hyper parameter automatic tuning technology in text classification.The main research work of this paper is as follows:(1)A business card recognition method combining logical rule matching and text classification is adopted to solve the problems of classification error caused by ambiguous words and incomplete recognition after long text truncation;The character layer LSTM in OCR recognition is changed to LSTM+CRF,which provides a certain error correction ability.Compared with the traditional method,the recognition rate of the whole business card is improved from 86%to 94%,and the recognition speed of the server is 600-800ms.(2)Lattice LSTM model in the field of named body recognition is used for text classification task,and word level and word level text representation are fused at the model end;The speed of lattice LSTM model is optimized to support Mini batch training and prediction;After embedding layer,convolution layers with different convolution kernel sizes are used to obtain different n-gram information.Compared with textcnn,the recognition accuracy is improved from 97.4%to 98.7%,and the speed is nearly ten times faster than the original lattice LSTM.(3)Based on DARTS,a set of search space suitable for text classification task is designed.After searching the best network structure,Bayesian optimization algorithm is used to search the best combination of super parameters.Experiments show that the accuracy of the model can reach 98.84%,which is close to lattice LSTM model. |