Font Size: a A A

Research On Key Technologies Of OCR For Cloud Desktop Pictures

Posted on:2019-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:S M DuanFull Text:PDF
GTID:2428330566497317Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Desktop auditing in a cloud computing environment prevents internal threats.The traditional way of cloud desktop auditing is to record the operation on desktop and review these video manually that consuming a lot of manpower and time.Currently,remote desktop information is mainly transmitted through Remote Desktop Protocol(RDP)by bitmap information.The user operates remote desktop to return the fragments of desktop picture.That is why most studies use Optical Character Recognition(OCR)to extract text information of images for indexing and auditing user behaviors.Using OCR technology to extract image textual information generally through text detection and text recognition.In recent years,with the development of deep learning,its performance in the field of text detection and text recognition has exceeded traditional methods.Applying the best performing horizontal text detection and text recognition algorithms to cloud desktop images has some problems to solve: 1.Real scene training data is seriously insufficient;2.Text detection may cause a large number of false detections becauseof the icons which have similar shape and size to the text;3.Character recognition has poor recognition of long sequences with both Chinese and English texts.1.In response to the above issues,the main research contents and achievements of the paper are as follows: In the text detection training phase,this paper uses Chinese text detection contest data to conduct text detection and expression training and fine-tuning of parameters with a small amount of manually annotated data after the model getting a certain text detection capability.For the character recognition model,this paper uses a picture synthesis method to synthesize training pictures;2.To distinguish between icons and text,this paper adjusts the text detection network structure,designs two different anchors and returns the entire text box at once.The new network improved text detection accuracy to detect the cloud desktop images;3.This article introduces “attention mechanism” in the decoding stage of text recognition model making the model focus on the image feature area when predicting the output character.“Attention mechanism” solved the problem that the decoding part of the text recognition model is limited by the input fixed length and enhances the recognition ability in long text pictures.At the experimental stage,this article uses the improved text detection model to test on the cloud desktop images,text detection accuracy increased from 55.8% to 76.5%.Combining the improved text detection and text recognition model to solve the cloud desktop auditing issue,text recognition accuracy increased from 80.6% to 91.6%.
Keywords/Search Tags:Cloud Desktop Audit, OCR, Deep Learning, Text Detection, Character Recognition
PDF Full Text Request
Related items