Table Structure Recognition On Document Images

Posted on:2023-01-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Y Xue

Full Text:PDF

GTID:1528306845997189

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

A table arranging data in rows and columns is a very effective data structure,which has been widely used in our daily work and life,such as conference schedules,financial reports,and credit card bills.Though human can easily understand tables with different layouts and styles,it remains a great challenge for machines to automatically recognize the structure of various tables.Considering the massive amount of tabular data with unstructured formats(e.g.,image and PDF files)in online and offine documents,an automatic table structure recognition system will facilitate large-scale tabular data analysis.Early methods for table structure recognition usually utilize rule-based or statistical techniques with hand-crafted features,working well in only constrained settings and lacking generalization.Recently,deep learning-based table structure recognition approaches model a table as either the markup sequence or the adjacency matrix between different table cells,failing to address the importance of the logical location of table cells.Considering these weaknesses in existing methods,this thesis takes the graph structure as the core and investigates a few key problems for table structure recognition:(1)This thesis proposes a graph-based table structure representation according to the adjacency relation between different cells,and the cell logical location can be in-ferred on the graph.Specifically,we first detect the neighbors of each table cell through a cell relationship network.A distance-based sample weight is proposed for the class imbalance problem.Then,according to the detected relationships,a weighted graph is created to infer the logical location of each cell.To verify the effectiveness of the proposed method,we collect and annotate a dataset containing238 Chinese medical documents with 476 tables.This work preliminarily explores the graph representation of a table and experimental results demonstrate its effec-tiveness.(2)This thesis reformulates the problem of table structure recognition as the table graph reconstruction,which requires the model to jointly predict the cell spatial location and the cell logical location.To solve such a problem,this thesis pro-poses an end-to-end trainable table graph reconstruction network,which uses a segmentation-based module to detect the cell spatial location and solves the cell logical location prediction as an ordinal node classification problem.Moreover,we contribute a new benchmark generated from TABLE2LATEX-450 K dataset with 350 K table graph annotations.This work combines the table graph with the deep neural networks to jointly predict the cell spatial location and logical loca-tion,which is more generalized than existing methods and easy to connect with downstream table analysis and understanding tasks.(3)This thesis makes some improvements and proposes a more powerful table graph reconstruction network.For the various cell scale and unprecise spatial location de-tection,based on the cell segmentation,this work further detects key points within each cell and regress the relative distance from these points to the cell vertexes.For the information block from distant nodes,this replaces the sparse graph convolu-tional network with the masked Transformer.Experiments demonstrate the effec-tiveness of these improvements for both cell spatial location detection and logical location prediction.(4)This thesis proposes a question answering system for unstructured table images,which combines the proposed table graph reconstruction network with a weakly supervised table parsing model.Given a table image with a question,the proposed system first utilizes a table recognition module to transform an unstructured table into a semi-structured format.The system then tokenizes the table/question and employs a table parser to predict the answer.This demo shows how the table struc-ture recognition model plays a role in the downstream table parsing task.With the above studies,this thesis builds a complete end-to-end table structure recognition algorithm,shows the effectiveness in the table question and answering task,and provides new ideas for further research.

Keywords/Search Tags:

Table Structure Recognition, Document Understanding, Visual Question and Answering, Object Detection, Visual Relationship Detection, Ordinal Classification, Graph Neural Networks, Transformer

PDF Full Text Request

Related items

1	Research Of Visual Question Answering Method Based On Deep Learning
2	Research On Visual Object Detection And Relationship Understanding
3	Research On Question Understanding Method Of Knowledge Graph Question Answering System
4	Visual Question Answering Based On Deep Reasoning
5	Research On Question Answering Systems For Visual Content And Emotion Perception
6	Research On Situational Reasoning Visual Question Answering Based On Graph Neural Network
7	Research On Key Algorithms Of Visual Question Answering Based On External Knowledge And Semantic Understanding
8	Research On Visual Question Answering Algorithm Based On Deep Learning
9	Research On Question Answering Based On Unstructured Document Understanding
10	Research On Visual Question Answering Technology Based On Knowledge Graph