Research On OCR Text Sorting Based On Multimodal And Graph Neural Network For Rich Text Image | | Posted on:2024-07-23 | Degree:Master | Type:Thesis | | Country:China | Candidate:Z Y Zhao | Full Text:PDF | | GTID:2568306944961899 | Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree) | | Abstract/Summary: | PDF Full Text Request | | With the development of science and technology,rich text image data has grown in large numbers,and more and more image information has flowed into the audit and supervision system.When faced with images with rich text information,the supervision system first uses optical character recognition(OCR)to extract the text information in the image.The extracted text information is usually sorted according to the simple top,bottom,left and right positions,but when facing multiple layouts or discrete rich text images.This sort method based on the simple logical order of position often leads to the confusion of reading order between texts,and leads to the loss of information.Most of the previous algorithms are based on location logic,which can solve the problem better in the face of traditional scenes,but the effect is poor in the face of existing streaming media scenes.Moreover,there is no algorithm based on deep learning in this field,and there is no corresponding data annotation standard and training scheme.In order to solve this problem,this thesis studies this problem based on deep learning multimodal technology and graph neural network technology.The main research contents and innovative achievements of this thesis are as follows:(1)This thesis proposes an image text sorting model based on the visual features,text features and spatial features of multimodal and graph neural networks with deep learning.The processing flow of the model is as follows:firstly,the rich text image is processed by visual feature extraction network,then the image features in the OCR region are pooled,and then the text features in the OCR region are extracted.After the image and text features are fused,the image neural network is embedded according to the spatial location features.Because of the arrangement of the OCR text area in the image,the graph neural network has the characteristics of non-European space,and the graph neural network has a good adaptability to the data in non-European space.Therefore,the model uses the characteristics of the graph neural network to iterate,so that each point in the graph neural network can be updated interactively with other points.Finally,the overall order is judged according to the relationship between points in the graph neural network.(2)This thesis formulates a version of annotation standard for the OCR text sorting problem.At the same time,according to the annotation standard,a set of OCR text sorting data in the streaming media scenario is labeled.Since there is less work on OCR text sorting,the establishment of annotation standards and the collection and annotation of data sets need to be done by ourselves.The dataset consists of 2000 rich-text images.The number of text areas in each image ranges from 4 to 150,and there are about 60000 text areas in total.(3)In order to verify the performance of the model,three comparative experiments and four ablation experiments were designed.In this thesis,the first edition combination that performs best in the test set is selected from three visual feature extraction networks and two text feature extraction networks.At the same time,three existing OCR text sorting algorithms are built and compared with the model in this thesis.The results show that the score of our algorithm model on the test set is significantly higher than that of the existing algorithm scheme.Finally,four ablation experiments were carried out to screen the three modules of visual side,text measurement and graph neural network to analyze the impact of different modules on the overall model ranking results.The results show that visual characteristics and graph neural network have obvious positive effects on the overall ranking results.The overall experimental results show that the proposed scheme greatly improves the accuracy of OCR text sorting results by introducing multimodal and graph neural networks. | | Keywords/Search Tags: | Document Layout Analysis, Multimodal, Graph Neural Network, OCR, Image Text Sorting | PDF Full Text Request | Related items |
| |
|