| Extracting key information from various document images,such as historical documents,receipts,orders and credit notes,plays an important role in office automation including efficient archiving,compliance checking and so on.The traditional methods mainly adopt template-based matching methods,first performing template matching on a document,and then extracting fixed regions information according to the template.These methods need to rely on the document layout,and it is less effective for documents that have never appeared in the same layout.This results in low robustness to a diverse set of datasets,making it difficult to apply and migrate widely.In fact,the information in the document often has a strong contextual relationship,for example,there is a strong correlation between "total price"and its corresponding price value "$ 23.34".This contextual information has not been built and used in traditional methods.In order to pay attention to the context information,the time series methods based on deep learning are proposed,and the robustness of the model is improved to a certain extent.Time series models can capture some contextual information before and after.However,for complex documents,such as documents with tens to a hundred text areas,it is difficult for a time series-based model to capture remotely related regions.In addition,some important information in the document often has some differences in font size.For example,the font size of the total price of "$ 23.34" will be larger than the font size of the price of a single item.The application of these spatial image information can further improve the ability and robustness of document analysis.Aiming at the above problems existing in current automatic document analysis methods,this paper proposes an end-to-end spatial graph reasoning neural network for key information extraction to make up for these shortcomings.The main innovations of this paper include:(1)The spatial information is considered in our model,including the relative position information of the associated region,and the image region size information.The spatial information provides the model with information about the spatial position relationship of the associated regions and the relative region size,which guides the model to focus more on regions with a certain spatial correlation to improve model performance;(2)An end-to-end spatial graph reasoning neural network is proposed for key information extraction.The spatial information encoding module is used to encode the spatial information of the text region in the document as the edge information of the graph reasoning neural network.And the text features of the text regions are extracted as the node information of the graph reasoning neural network.The entire document information is embedded in the graph network for reasoning,which effectively shows the relevance of information in different regions.(3)We validate the effectiveness of our proposed method on public standard datasets,such as the SROIE competition dataset.In the end our model outperformed all comparison methods.In addition,we first tried to process the data set to ensure that the layout of the test set did not appear in the training set.It was used to verify the performance of the model for the unseen layout and the robustness of the model was verified. |