Font Size: a A A

Research On Document Layout Analysis Algorithm Based On Intra-domain And Inter-domain Knowledge

Posted on:2022-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J WuFull Text:PDF
GTID:1488306773483744Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The goal of Document Layout Analysis(DLA)is to decompose document images into high-level semantic regions(i.e.,figures,tables,text,and background).Docu-ment layout analysis is the fundamental task of Optical Character Recognition(OCR),document understanding,table recognition,and image retrieval.The documents are classified into different categories based on the differences in document style(i.e.,aca-demic,magazine),and each category constitutes an independent domain.In practical applications,the considerable distribution difference between domains leads to apparent performance loss in the cross-domain process of the trained model.The performance loss of this phenomenon comes from two aspects.For one thing,the edge knowledge of the samples and the feature knowledge is not fully utilized before the cross-domain.For another,the cross-domain time-domain connection information and key sample infor-mation in the target domain mining are insufficient.This paper discusses the problems of inadequate knowledge mining within domains and insufficient knowledge gen-eralization between domains in current document layout analysis tasks.It focuses on in-depth research from four aspects:sample edge knowledge mining,sample feature knowledge mining,sample cross-domain connection information mining and cross-domain key sample information mining,and the following innovative results are ob-tained:(1)To devote the problem of insufficient sample edge knowledge mining,we pro-pose a document image layout analysis method based on an explicit edge embedding network.At present,a number of deep learning-based document layout analysis meth-ods focus on learning the knowledge of convolutional networks directly from color channels.Many high-frequency structures in images(especially image edges)con-tain richer information,which has been neglected by previous DLA studies.To make up for the problem of insufficient knowledge mining of example edges,we devise a new document layout analysis framework(E~3N et)with explicit edge embedding net-works.Specifically,E~3N et uses edge embedding blocks and dynamic skip connection blocks to generate more detailed features.To enhance the practicality of the frame-work,E~3N et exploits a lightweight fully convolutional subnet as the backbone.Edge embedding blocks can explicitly incorporate edge information obtained from document images.The dynamic skip connection blocks can derive color and edge representations with learnable weights.This paper conducts experiments using the proposed frame-work on three document layout analysis benchmarks,demonstrating the superiority of E~3N et over previous work in terms of effectiveness and efficiency.(2)To address the problem of insufficient sample feature knowledge mining,we introduce a document layout analysis method based on dynamic residual feature fusion.There has an information loss between sample features besides the implicit knowledge ignored.We propose an end-to-end joint network named Dynamic Residual Fusion Network(DRFN)to extract the knowledge contained in the sample features.Specif-ically,we design a dynamic residual feature fusion module that can fully utilize low-dimensional information and maintain high-dimensional category information.To solve the problem of model overfitting due to insufficient data,we propose a dynamic selec-tion mechanism for efficient fine-tuning of limited training data.This paper conducts experiments on some challenging datasets to demonstrate the effectiveness of DRFN.(3)To devote the problem of unbalanced cross-domain connection information of examples,we propose an unsupervised cross-domain document layout analysis method based on style guidance.To achieve a ubiquitous DLA framework based on taking ad-vantage of the knowledge contained in the sample itself,one crucial factor is how to extract implicit information that exists between domains.Due to the diversity of doc-ument objects in terms of layout,size,aspect ratio,texture,etc.,it is challenging to create a DLA framework with strong generalization ability.Many researchers address this drawback by synthesizing data to construct large training sets.However,the syn-thetic training data has a different style and inconsistent quality.Furthermore,there is a large gap between the source and target data.We propose an unsupervised cross-domain DLA framework based on document style guidance.The framework integrates document quality assessment and cross-domain analysis into a unified framework.The framework consists of three components,the document layout generator(GLD),the document element decorator(GED),and the document style discriminator(DSD).GLD is used for document layout generation,GED is employed for document layout element filling,and DSD is deployed for document quality assessment and cross-domain guid-ance.First,we apply GLD to predict the location of the generated documents.Then,we design a new algorithm based on aesthetic guidance to populate the position of the document.Subsequently,we exploit contrastive learning to evaluate the quality assess-ment of documents.Moreover,we design a novel strategy to transform the document quality assessment component into a cross-domain style guide component.We perform a series of experiments to demonstrate that the style-guided unsupervised cross-domain document layout analysis method achieves remarkable performance.(4)To alleviate the insufficient utilization of cross-domain critical information,we propose a key sample selection method based on external knowledge guidance.For some special samples,only relying on unsupervised cross-domain methods has limited effect.To further extend the generality of the DLA model,we introduce a human-in-the-loop(HITL)collaborative intelligence.The HITL can push the model to learn from unknown problems by adding a knowledge-based small amount of data,which is beneficial to the model to achieve generalization.HITL selects key samples by using confidence,however,using confidence to find key samples is not suitable for DLA tasks.We propose a key sample selection(KSS)method to find key samples in high-level tasks more accurately(semantic segmentation)by the cooperation of agents.Once key samples are selected,these key samples are sent to the human for labeling,and we will update the parameters of models by using the labeled samples.we re-examine the learning system from reinforcement learning and design a sample-based update strategy that effectively improves the agent's ability to accept new samples.Experiments are conducted on some challenging datasets to demonstrate the effectiveness of the key sample selection method guided by external knowledge.
Keywords/Search Tags:Document layout analysis, deep learning, explicit edge embedding, dynamic residual feature fusion, unsupervised
PDF Full Text Request
Related items