Font Size: a A A

Complex Document Layout Segmentation Based On Deep Learning

Posted on:2022-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:J YaoFull Text:PDF
GTID:2518306563974289Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age,computer document analysis and recognition is of great significance in the fields of content recognition,content-based retrieval,etc.,and has therefore become one of the important topics in the field of information processing.Document layout segmentation is a key processing step for document analysis and recognition,and it refers to dividing the document layout into different regions such as background,text,tables,pictures,etc.The accuracy of layout segmentation will directly affect the overall performance of the document analysis and recognition system.In recent years,the emergence of deep learning methods has made great progress in the task of document layout segmentation.However,due to the flexible and complex document layout,the large difference in region size,and the variability of element shapes,the research on the high-precision document layout segmentation algorithm still faces huge challenges.Lots of researches on deep learning based document layout segmentation algorithm are done in this thesis,and according to the above characteristics,a multi-scale complex document layout segmentation algorithm based on an encoder-decoder architecture is proposed.The algorithm firstly extracts the image features using the feature extraction network,and then sends the feature map into the multi-scale feature extraction network to generate the multi-channel feature map containing multi-scale features.Finally,the decoder is used to up-sample the low-resolution feature map to achieve pixel-level layout segmentation.The main works in this thesis could be concluded as follows:(1)Comparative study of existing semantic segmentation networks.Several different types of semantic segmentation networks are analyzed and studied through a series of comparative experiments,and a network model suitable for the task of document layout segmentation is selected.(2)Aiming at the local ambiguity problem caused by the structure characteristics of the document layout,a multi-scale feature extraction network is designed.The network consists of two modules,in which the deconvolution pyramid pooling module fuses the multi-scale information by constructing the feature pyramid to improve the segmentation accuracy of the small-size regions,and the location attention module integrates the longrange context information by capturing the spatial relation between pixels to improve the segmentation accuracy of the large-size regions.The experimental results show that the fusion of the two modules can improve the multi-scale feature representation capability of the network,and effectively solve the problem of local ambiguity.(3)In order to further improve the segmentation performance of the network,a multi-scale document layout segmentation network based on an encoder-decoder architecture is proposed.A decoder with a bottleneck structure is applied to up-sample the low-resolution feature maps so that the deep feature maps contain richer semantic information.Different from the conventional encoder-decoder architecture,this thesis introduces a multi-scale feature extraction network between the encoder and decoder,which effectively enhances the feature representation capability of deep feature maps and further improves the segmentation accuracy of the network.With the purpose of evaluating the effectiveness of our method,comparative experiments are carried out on three public datasets,Publay Net,RDCL and DSSE-200,which have different document layout complexity in this thesis.The experimental results demonstrate that the multi-scale layout segmentation algorithm based on an encoderdecoder architecture achieves high-precision segmentation of complex document layout and has high generalization performance.
Keywords/Search Tags:Deep Learning, Document Layout Segmentation and Recognition, Multiscale Feature, Attention Mechanism, Encoder-decoder Architecture
PDF Full Text Request
Related items