Font Size: a A A

Research On Breaking Algorithm Of Distortion And Adhesion Text-based CAPTCHA

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y SongFull Text:PDF
GTID:2428330602450571Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
CAPTCHA which is called completely automated public turing test to tell computers and humans apart,is widely applied by many websites.CAPTCHA guarantees the security of the network to a certain extent.It prevents computer programs from cracking passward in a vilence way and other malicious attacks.To verify the security and reliability of the CAPTCHA,the CAPTCHA recognition technology is generated.Now,there are many CAPTCHA types,including text CAPTCHA,image CAPTCHA,voice CAPTCHA,slider CAPTCHA,etc.This paper mainly research the recognition technology of text CAPTCHA.For all kind of text-based CAPTCHA schemes,the breaking methods are also various.According to whether there is segmentation or not,this paper contains two frameworks.The first and second stage adopt the framework of “segmentation + recognition”,and the third stage uses deep learning to realize end-to-end recognition.The school employment information network,Jingdong Mall and tencent CAPTCHA are selected for recognition.The specific work are as follows.Firstly,considering that the traditional single character segmentation method has obvious disadvantages for the employment information network CAPTCHA.An improved segmentation algorithm is proposed.First,the connected characters are marked by the connected components,and the prior knowledge of the CAPTCHA is used to determine whether the number of characters is correct.Then we use vertical projection to obtain complete character.This method can effectively solve the problem that the shadow characters are difficult to be correctly segmented.Secondly,as the traditional segmentation methods could not segment connected characters correctly,a segmentation algorithm based on improved drop-fall algorithm was proposed.Zhang-Sueng's thinning algorithm and the clustering of the touching region via selforganizing maps is used to find the starting drop point of drop-fall algorithm.A new drop path is defined to improve drop-fall algorithm.The water dropped from the starting drop point,along the skeleton of the character overlap stroke,at the end of the overlapped stroke skeleton,then continued dropping along the slant angle direction of the skeleton,until meet the boundary of the character connected part.The water drop path was defined as the connected character segmentation path.Compared with the traditional drop-fall algorithm and the vertical segmentation method,the improved algorithm has a higher accuracy to segment the Jingdong Mall CAPTCHA.Then the 8-layer convolutional neural network is used to recognize the single character.And the recognition time was about 0.46 seconds and the recognition rate reached 88%.Then,as the tencent CAPTCHA has distortion,adhesion,hollow characters and large area shadow block noise,it is difficult to correctly recognition the CAPTCHA by using the framework of “Segmentation + recognition”.This paper use the convolutional neural network to directly recognition the tencent CAPTCHA without any pre-processing.A migration learning method based on pre-training XCeption network is used to recognition tencent CAPTCHA.The experimental results show that the convolutional neural network has a good recognition effect on complex CAPTCHA,and the single recognition rate reaches 75%.
Keywords/Search Tags:CAPTCHA, character segmentation, connected characte, drop-fall, CNN
PDF Full Text Request
Related items