Font Size: a A A

A Study On Several Key Techniques In Text Image Analysis

Posted on:2023-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X MaFull Text:PDF
GTID:1528306902954529Subject:Information and Communication Engineering
Abstract/Summary:
With the popularity of mobile devices and the development of storage technology,the number of images is also growing explosively.Among them,the text image containing textual information has even become an indispensable medium in the process of information recording,spreading and communicating nowadays.Though rich information is contained in text images,the information is in the form of analog signals,i.e.,the pixel values,which are not easy to consume.If these analog signals can be digitized,it will be much more convenient to access useful information from these text images.In text images,text and table are very common and important high-level semantic information.Extracting text and table from text images automatically can be beneficial to many valuable applications,such as visually impaired person assistance,real-time translation,content-based image retrieval,invoice/receipt recognition and checking,document image digitization,structured information extraction and so on.To achieve these goals,robust text detection and table extraction are critical prerequisites.However,due to the high variations of text and table(e.g,variations in text font,color,scale,shape,orientation and language as well as variations in table style,size and layout),extremely complex backgrounds(e.g.,some text-like or table-like background objects),as well as various distortions and artifacts caused by image capturing(e.g.,scanning noises,non-uniform illumination,low contrast,low resolution,shadowing,perspective distortion or geometrical distortion),robust text detection and table extraction from images are still unsolved problems.In this thesis,after an in-depth study of advances in related fields,we conduct research on several key techniques in text image analysis,i.e.,robust text detection,table detection and table structure recognition,which can be summarized as follows:(1)We propose a visual relationship detection based text detection approach.To handle texts with extreme aspect-ratio,dense and arbitrary-shaped texts,we choose to detect texts in a bottom-up manner,i.e.,detecting text segments first and then grouping them into text instances.However,previous bottom-up methods tend to mistakenly merge nearby text instances together or over segment text instances with large intercharacter spacing into pieces.To address the text-line grouping problem robustly,we propose to formulate text detection as a visual relationship detection problem.Then,for text-line grouping,a "link" relationship can be defined to indicate whether two text segments belong to a same text instance.We first present a relation network based pairwise link prediction approach,which can leverage context information from the union box of a text segment pair to improve link prediction accuracy.Then,we further propose to construct a graph with the detected text segments,and introduce a new Graph Convolutional Network(GCN)based link prediction approach,which can leverage context information more effectively than relation network to improve the link prediction accuracy.Experimental results show that our approach can achieve better text detection accuracy than previous methods,especially when dealing with texts with large intercharacter spacing and dense and arbitrary-shaped texts.(2)We propose a new table detection approach,which can achieve higher localization accuracy with the help of table corners.Since the table is usually arranged in a regular grid manner and there is an alignment between the texts within a table,the corners of the table have quite clear semantics,which is an important clue for high precision table localization.Based on this observation,we propose to generate table proposals by detecting and grouping corner points.In this way,we can leverage the pixel-level cues revealed by the corner points on the heatmaps to improve the quality of table proposals significantly.Then,a Fast R-CNN module is used to reject non-table proposals and refine the bounding boxes of table proposals.Experimental results show that our approach achieves better performance than previous methods on three public table detection benchmarks by only using a lightweight ResNet-18 backbone network,and the performance gap is even larger when comparing at more strict evaluation metrics.(3)We propose a new table structure recognition(TSR)approach,which follows the split-and-merge paradigm.Most existing deep learning based TSR methods assume that tables are axis-aligned,so that they cannot be directly applied to geometrically distorted or even curved tables.Moreover,only a few methods have taken the spanning cells into consideration,and the performance is still unsatisfactory.To address these problems,we first propose a novel spatial CNN based separation line prediction module to split the table into a grid of cells.As the spatial CNN can effectively propagate contextual information across the whole table image,improved robustness can be achieved for tables with large blank spaces and curved tables.Then,we introduce a simple but effective Grid CNN module to recover the wrongly split cells,especially the spanning cells.In this module,the whole table is compactly represented as a grid,on which context information can be aggregated effectively with several stacked convolution layers to achieve excellent cell merging accuracy.Consequently,our approach achieves stateof-the-art performance on three public TSR benchmarks.Moreover,we have further demonstrated the advantages of our approach in recognizing tables with complex structures,large blank spaces,empty or spanning cells as well as geometrically distorted or even curved tables on a more challenging in-house dataset.
Keywords/Search Tags:Text image analysis, Arbitrary-shaped text detection, Visual relationship detection, Relation network, Graph convolutional network, Table detection, Corner detection, Table structure recognition, Split-and-merge
Related items