Font Size: a A A

Research On Chinese Character Recognition Algorithm Under Scrambling Conditions

Posted on:2020-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J H QiFull Text:PDF
GTID:2438330599455719Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
At present,Chinese character recognition has mature methods,but text images with sensitive vocabulary in the network often scrambled in order to avoid detection by software systems.Due to the increase of interference factors,traditional optical character recognition(OCR)technology can not effectively identify the text image after scrambling processing.Therefore,research on such text image recognition has important research significance for information content security and information dissemination security.Based on the traditional OCR,this paper proposes a new algorithm for text image recognition after scrambling.The algorithm is divided into four steps: first,use the de-interference,grayscale and binarization algorithms to initialize the text image to remove image interference;secondly,propose a single-word cutting algorithm combining projection method and prior knowledge method to complete The single-word image of the text image is separated,and a single-character image is obtained.Thirdly,a single-character recognition algorithm based on dynamic time warping(DTW)is proposed to match the obtained single-character image,and the preliminary recognition result is obtained.Fourth,the feature based on the structure characteristics of the Chinese character is proposed.The vector and the stroke-like glyph-like similarity algorithm are used to construct the shape-like near-character library,and the post-processing method combining the near-character and the language model is proposed for the deficiency of the traditional post-processing method,which realizes the error correction and optimization of the preliminary recognition result and the best output.Match the result.Through experiments,the reliability,adaptability,anti-interference and overall effectiveness of a series of algorithms proposed in this paper are verified.The experimental results show that the accuracy of the algorithm described in this paper is below Baidu OCR and Tencent OCR when there is no interference;the accuracy of the algorithm described in this paper is 4.2% higher than Baidu OCR when the scrambling density is 0.02,which is 5.7 more than Tencent OCR.%;The accuracy of the algorithm described in this paper is 4.2% higher than Baidu OCR when thescrambling density is 0.05,and 6.6% higher than Tencent OCR.The comprehensive results show that the series of algorithms such as preprocessing,single word cutting,single word recognition and post processing described in this paper are more effective when the interference factor is small.
Keywords/Search Tags:pattern matching, dynamic time warping, near-word, post-processing
PDF Full Text Request
Related items