Font Size: a A A

Keyword Retrieval In Printed Image Based On Neural Network

Posted on:2022-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J G T L T Y BaFull Text:PDF
GTID:2518306542955309Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the network,the printed images on the network are growing explosively.As one of the information carriers,images have become the object of people's daily information retrieval.Therefore,how to find the required text information in printed images has always been a research hotspot.In previous studies,for the problem of printed image feature extraction,it is usually to combine two different convolutional neural networks or change the network model parameters,but there is no mention of why to combine the two models or why to change the model parameters.In view of the above problems,through experiments,this paper obtains which network model of Goog Lenet,Resnet101,Lenet-5 is more suitable for keyword retrieval in printed images.Then,the most suitable network model is improved to further improve the accuracy of keyword retrieval in printed images.The specific research work includes.(1)The data set for model classification is constructed,which is the image generated by 3765 first-class Chinese characters from the national standard.Firstly,this paper constructs a font data set to enhance the generated image,so that each Chinese character image of each font has 10 different forms,such as distortion,noise and rotation.Firstly,a font data set is generated,and then the generated data set is divided into training set,including verification set and test set.The training set does not include verification set and test set.The fonts of verification set and test set are different from those of training set.Then generate two font datasets and three font datasets,and divide them in the same way.(2)Lenet-5,Goog Lenet and Resnet101 convolutional neural networks are trained to extract the features of printed images.It is obtained that the accuracy of googlenet network on each test set is higher than that of resnet101 and lenet-5 network models.In order to improve the accuracy of Goog Lenet's keyword retrieval in printed images,a network model is combined with convolutional attention mechanism(CBAM)and Goog Lenet.However,during the experiment,it is found that the accuracy of the model on the test set is different when CBAM is inserted into the Goog Lenet network structure at different positions.Therefore,the different positions of CBAM inserted into the Goog Lenet network structure are studied.By comparing the experimental results of inserting CBAM into different positions of Goog Lenet,it is concluded that when CBAM is inserted into the first layer convolution layer and the seventh layer inception structure of Goog Lenet,and under the data sets with more than two fonts,the accuracy of Goog Lenet + CBAM network model is improved by 0.2% and 2.7% compared with Goog Lenet on the same test set.(3)In order to further improve the accuracy of keyword retrieval in printed images,convolution attention mechanism(CBAM and BAM)is used to extract features before the first layer of convolution layer and after the seventh layer of inception structure of Goog Lenet in parallel,and the extracted features are cascaded and fused.The experimental results show that the accuracy of Goog Lenet + CBAM + BAM on the same test set is improved by 0.25% and 22% respectively.The accuracy of Goog Lenet +CBAM + BAM on the same test set is 0.13% higher than that of Goog Lenet + CBAM,and the highest is 26.6%.(4)In this paper,the back-end is realized through the pytorch framework in Python,the front-end is realized through pyqt5 in Python,and the front-end and back-end connections are realized through the paramkio interface.Through a large number of keyword retrieval experiments,the thresholds of six fonts in keyword retrieval are set,so that the front-end user can input the retrieved printed images,keywords and fonts that generate keywords,The back end extracts features through googlenet + CBAM + BAM model and returns the keywords in the image to the keyword retrieval system in the printed image at the front end.
Keywords/Search Tags:Convolution neural network, Keyword search, Convolutional attention mechanism, Data enhancement, Feature fusion
PDF Full Text Request
Related items