Font Size: a A A

Analysis Of Complex Layout Based On Machine Learning

Posted on:2019-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:B C XuFull Text:PDF
GTID:2428330545990152Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,analysis of the simple layout has been applied to solve with using the OCR technology.But with the continuous development of the information technology,the text image layout changes from a simple combination of text and graphics to a complex form including various text,tables,graphics and images.This complex layout will directly affect the recognition effect of OCR and the OCR can not even be applied in this case.Therefore,effective analysis of complex layout is necessary before the OCR recognition.The methods of traditional layout analysis such as connected domain method and projection method have become increasingly unpracti?cal.Because the running time is too long and the analysis accuracy is low,it can't meet the needs of users.In recent years,many methods in pattern recognition and machine learning are applied to layout analysis.Support vector machine(SVM)is one of them.Based on structural risk minimization principle,kernel function is applied to realize nonlinear mapping from low dimensional to high dimensional space.The principle of structural risk minimization avoids the over fitting phenomenon,on the contrary,the learning machine's generalization ability can be enhanced and it can effectively deal with the complex layout.This paper proposes a complex layout analysis method based on machine learning for images with complex layouts such as newspapers,scientific papers,and web pages.The main contents are as follows:1.Starting with the key features of the image,the statistical features of the gray level,shape,texture and phase consistency are selected.And this paper studies GLCM algorithm in texture feature,combined sum and difference histograms method to imp-rove it.Through training and testing,it proves that this method can reduce the compu-ting time of feature extraction and achieve good results in layout analysis.2.To complete the process of layout segmentation under the complex background,two methods of support vector machine(SVM)and BP neural network are used to seg-ment and extract the text area in the image.Finally,the comparison and analysis are made according to the experimental results.3.In the process of classification of the divided pages,the method of multi-classif-ication based on support vector machine(SVM)is studied in detail.Additional,aiming at the problem of non-separable regions,an improved method based on distance meas-ure is proposed.After testing,the improved method can further improve the accuracy of layout classification.Experiments show that the method used in this paper can be used for effective layout analysis of images with complex layouts.And it has the characteristics of fast speed and accurate analysis.In all,it has a certain application prospect.
Keywords/Search Tags:layout analysis, machine learning, pattern recognition, feature extraction, support vector machine
PDF Full Text Request
Related items