Font Size: a A A

Research And Application Of Digital Document Layout Analysis For Multi-scale And Multi-objectiv

Posted on:2024-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z C GaoFull Text:PDF
GTID:2568306923485504Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Document Layout Analysis(DLA)aims to locate document elements such as tables,images,headings,and paragraphs in a document and to predict the semantic class of a particular document element,DLA is the basis for tasks such as Optical Character Recognition(OCR),image document understanding,DLA is the basis for tasks such as optical character recognition(OCR),image document understanding,unstructured information extraction,table parsing,and block recognition.As a pre-processing step for content extraction,DLA has the potential to capture rich information in historical and scientific documents at scale.(1)To address the problem that traditional object detection frameworks do not have high accuracy in identifying lists,titles,and tables on multiple PubLayNet sub-datasets in document layout analysis tasks,this paper proposes a multi-scale cross-feature fusion model for document layout analysis,YOLOLayout.this method embeds a multi-scale shallow visual MS-SVFEM uses channel attention module,spatial attention module and multi-branch convolution to extract multi-scale information,while MS-CFF is embedded in a path aggregation network(PANet)and uses an attention mechanism to adaptively fuse different hierarchical features.The method is validated on PubLayNet dataset,CDLA Chinese dataset,and self-built HPA(Home Page Analysis)dataset,where the mAP is 87.9% on PubLayNet,which is2.2% higher than Baseline.(2)A digital document parsing system is implemented based on the YOLOLayout proposed in(1)to create a new automated processing system for document integration.This system realizes the functions of key information extraction(block identification),table parsing,and document layout restoration(document format conversion).Key information extraction is designed as a set of multimodal processes to realize the extraction of field fields in documents;document layout restoration is to convert document images into HTML/WORD and other documents with approximate layout styles,and for the problem of restoring the position of document elements that exist in offline parsing of digital documents into HTML files,a layout restoration method based on absolute positioning templates and dynamically generated templates is proposed to maximize The proposed layout restoration method is based on absolute positioning template and dynamically generated template to restore the original layout of digital documents to the maximum extent.The system uses 20 PDF Chinese papers to evaluate the effect of key information extraction and layout restoration functions,and 20 table images to evaluate the table parsing function.The key information extraction function extracts Chinese titles,English titles,journals,and abstracts in the first page of 20 PDF documents with 90%completeness,but there are missing extractions and wrong extractions for authors and addresses;the layout restoration function generates iso-text style HTML files and retains 100% text content for 20 PDF documents(non-nested text documents).In summary,the total recognition accuracy of YOLOLayout model improves2.2% over the baseline model,among which the recognition accuracy on Table,List,and Title elements improves 2.1%,4.5%,and 0.7%,respectively.The digital document parsing system can automatically extract important information and table information from documents and realize the recovery of single-column,double-column,and single-and double-column mixed layout of digital documents.
Keywords/Search Tags:Deep Learning, YOLOV5, Document Layout Analysis, Multi-Scale Features, Digital Document Analysis System
PDF Full Text Request
Related items