Font Size: a A A

Research On High-throughput Detection Technology Of Specific Documents

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J D RenFull Text:PDF
GTID:2428330614453861Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularization of Internet technology and informatization,digital documents are widely used by people.Due to the proliferation of digital documents,information security problems began to emerge,and some sensitive documents that need to be targeted to specific groups of people have also been mistakenly uploaded to library websites,leading to information leakage.According to the investigation,the leakage of specific documents from library websites has been on a high trend in recent years,which poses a serious threat to information security and public interests,causing irreversible economic or other losses.It has become an important practical requirement to conduct information security checks on documents shared by library websites.Since a large number of documents are uploaded to library websites every day,how to design a fast and accurate detection algorithm for high-throughput sensitive document images to achieve the full detection of the daily uploaded document images of the website at a lower cost,and to detect whether there is a specific document image,has become an urgent research topic in the current stage.According to the actual situation,there is no good solution to such problems at present,so this paper selected a library site A as the research object,and conducted in-depth research on the above problems.The main work is as follows:(1)A high-throughput detection system for specific documents based on cascade structure is proposed.Firstly,the system receives the document image from the database in the library website,and makes effective distinction between document images and nondocument images by taking advantage of the differences in the features of the underlying image.Secondly,a deep learning-based suspicious document classifier is used to classify documents into two categories: suspicious document and non-suspicious document.Finally,using layout analysis and OCR technology to determine whether there is a specific text in the area of suspicious documents,and complete the detection of specific documents.The high-throughput detection system for specific documents constructed by the cascade structure greatly improves the detection speed while maintaining good detection accuracy.(2)A lightweight classification network based on depthwise separable convolution and mixed convolution is proposed.By researching the system in(1),we found that there are a lot of redundant calculations in the system,which results in excessive calculation costs and cannot solve the specific document detection of the terminal device.This paper takes the document classifier as the research object,studies the lightweight network technology,and uses the design model based on depthwise separable convolution and mixed convolution to improve the document classifier.A lightweight document classification network was proposed to replace the original document classification network of the system in(1),which greatly reduced the parameters and calculations of the network model,and kept the detection accuracy within the scope of the task requirements.
Keywords/Search Tags:Convolutional neural network, text detection, text recognition, lightweight network
PDF Full Text Request
Related items