Font Size: a A A

Research On Chinese And English Text Recognition Algorithms In Natural Scenes Based On Attention Mechanism

Posted on:2022-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C SongFull Text:PDF
GTID:2518306731477814Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Different from the printed text with neat arrangement and clean background,the characters in the natural scene image have the characteristics of messy background,random distribution,different lengths and sizes,and diverse color and fonts.The use of traditional optical character recognition(OCR)cannot meet actual needs.The attention mechanism is widely used in the encoder/decoder framework in the current deep network model for text recognition.Given a query vector,calculate its correlation with the input vector,and obtains the importance of each input vector,so that important information can be selected from a large amount of information.However,it does not know whether the query vector is related to the input vector or how related,which may produce a result that is not expected by the downstream task,thereby misleading the downstream task.This paper introduces the "Attention on Attention"(AoA)module,which extends the conventional attention mechanism to determine the correlation between attention results and queries.A model based on the encoder-decoder architecture is constructed to solve the problem of text recognition in natural scenes.The model consists of five phases: transformation,feature extraction,refining Module,sequence modeling,and prediction.Added the AoA module to the optimization module and the prediction module respectively.For the optimization module,AoA helps to better model the relationship between different characters in the text image;for the prediction module,AoA filters out irrelevant attention results,only keep useful information.Experimentally verified,the natural scene text recognition model based on the attention mechanism proposed in this pape r performs well on the current representative data sets IIIT5 K,SVT,SP,CT,IC03,IC13,and IC15,respectively88.4%,89.7%,80.6%,75.3%,94.7%,95.0%,79.1% accuracy,about 6.7% and 1.4%higher than the average accuracy across all test datasets for the worst-performing and best-performing models compared in this paper.Based on the education quality assessment system project of the laboratory,this paper is responsible for extracting knowledge points from the blackboard.In terms of blackboard detection,the blackboard in the classroom has the characteristics of a single background and a relatively fixed position.By simplifying the YOLOv3 model,the speed of blackboard detection is improved.On the collected data set,an accuracy rate of 95% was obtained.In terms of text detection,the advanced CRAFT algorithm is used.On the collected data set,the test has verified that it has obtained a high accuracy.In terms of text recognition,due to the large number of Chinese characters,the number of parameters and the cost of computation are relatively large if one-hot encoding is adopted.This paper uses the character embedding trained on Baidu Baike to encode text labels based on the Skip-Gram wif Negative Sampling(SGNS)method,reduce the dimensionality of the feature representation of the text label vector,and constructs a natural scene Chinese and English text recognition model that can recognize 4993 characters.In order to evaluate the effectiveness of the model in actual scenes,data collection and lab eling were carried out in the classroom of Hunan University,and a natural scene Chinese and English text recognition data set containing 889 pictures was produced.Experimental verification shows that the recognition accuracy of the text recognition model proposed in this paper is 83%.
Keywords/Search Tags:Deep neural network, Natural scene text recognition, Attention mechanism
PDF Full Text Request
Related items