Font Size: a A A

Research On Otherness Comparison Method Of Historical Document Image Based On VGG Network

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:L B ZhaiFull Text:PDF
GTID:2428330647955115Subject:Engineering
Abstract/Summary:PDF Full Text Request
Ancient Chinese books and documents have important historical and academic research values,and the otherness comparison of Chinese characters in different versions of ancient Chinese books and documents is an important research content for the research on ancient books and Chinese characters.The traditional manual comparison methods have the disadvantages of low efficiency and accuracy.With the rapid development of computer technology,using it to assist the research on historical document image otherness comparison is one of the best options for the research on ancient Chinese books and documents.However,the historical document is composed of handwritten scripts,which has the characteristics of complex structure and variable fonts with the problems of cursive characters and intersecting appears frequently,which brings great difficulties to the automatic otherness comparison for the historical documents and books.Aiming for the above-mentioned problems,this thesis employs the layout images of Wenyuan Pavilion and Wenjin Pavilion of Siku Quanshu as the experimental objects to conduct research on the key issues in the historical document image otherness comparison method.(1)Extraction of text components of historical document imagesFirst,through utilizing the information of connected area classification and the lines obtained by using the LSD(Line Segment Detector)line detection algorithm,to filter the connected areas at the boundary position in the layout images to obtain the frame components;and then the detected line information is searched to obtain the approximate edge position of the frame components,and to correct the edge position of the frame in document images through the correction strategy to remove the frame in the document images,then vertical projection analysis method is used to solve the problem of touching between texts and dividing lines that the text components of the historical document images is obtained.(2)Character image segmentation for historical documentUsing deeplab V3+ semantic segmentation model,a method for historical document image segmentation based on Deeplab V3+ semantic segmentation model is designed for solving the problems of characters intersect or touch vertically and horizontally in document images.Firstly,a semantic segmentation and annotation method of ancient Chinese document images which segment the approximate outline of character into four kinds of polygonal regions was designed,which creates a semantic segmentation data set for the ancient Chinese document image.Then the semantic segmentation model is trained with the data set and applied into ancient Chinese character image segmentation to obtain the semantic segmentation result.Finally,the semantic segmentation post-processing algorithm is proposed by using the principle of minimum distance priority merge to obtain the image of a single ancient Chinese characters.(3)Otherness comparison of ancient Chinese layout imagesUsing VGG(Visual Geometry Group)convolution neural network model,a method based on VGG network was designed to compare the otherness of the layout images of ancient Chinese documents.Firstly,a VGG network model based on the features of ancient Chinese character images is established and trained with the built image database of ancient Chinese characters to obtain a otherness comparison classifier for the otherness of ancient Chinese character images.Secondly,a comparison algorithm for the otherness comparison in the historical document was built which uses the established ancient Chinese character otherness comparison classifier to compare the corresponding Chinese character images of historical document,so as to realize the otherness comparison and labeling of the two historical document images.The experimental results show that the accuracy rate of the historical documental image text components extraction algorithm is 89.5%,and the ancient character segmentation algorithm accuracy rate is 93.3%.In the experiment on the histrionic document image otherness comparison the otherness marking accuracy is 87.5%,The experimental shows that the proposed method can effectively improve the accuracy of the otherness comparison of the document images.
Keywords/Search Tags:Historical document image, Character image otherness comparison, Ancient Chineae characters Image segmentation, Image classification, VGG network model
PDF Full Text Request
Related items