Font Size: a A A

Research On Key Technologies Of Graphic Analysis And Recognition In Document Image

Posted on:2016-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L ZhangFull Text:PDF
GTID:1108330503969594Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Documents as the carrier of information are widely used in social life. In order to efficiently use and manage document information, people have begun to study the document processing technologies since 1960 s. By using scanner or document processing system,the document can be input into computer and can be converted to document image, so as to efficiently store, manage, and transmit documents by people. With the amount of document images increasing, the studies on document image analysis and recognition are focused by investigators. The researches on document image analysis and recognition attract many international researchers, and many methods are proposed and improved, however, there are still some problems which cannot be solved. Documents can be classified into text and graphic documents, the research target of this paper is graphic document denoted by engineering drawings, and the research issues are graphic(such as line, arc, and curve) recognition and graphic symbol recognition in the process of analyzing engineering drawing in order to effectively recognize and interpret graphic objects in engineering drawings.Engineering drawing includes many complex elements such as line, arc, curve,graphic symbol. Firstly, the line and arc recognition method are used to recognize the type of elements in image. The existing recognition algorithms get undesirable results when the image content is very complex. This paper proposes an arc recognition method,and this method can be generalized to recognize ellipse and parabola graphs. All methods tested by data are effective.When analyzing and interpreting images, it is not enough to just know the type of graph, the parameters of graph should be computed. Due to many circular graphs existing engineering drawings, therefore, this thesis mainly study the recognition and analysis methods on circular graph. It is crucial to compute the parameters(radius and coordinates of center) of circles in order to recognize and interpret the document images with circles.The traditional methods cannot get accurate points which are used to compute parameters of circles. This thesis presents a fitting-based method for computing the parameters of circles. This method firstly finds the proper seed points according to the odd or even line width, and then computes the parameters of arcs by combining the circle fitting method improved and the appropriate seed points. This method can be generalized to compute the parameters of ellipse and parabolic.All algorithms tested by data are effective.Intersection and touching of graphic elements in images is very common, arc intersected or touching with other elements are representative. When interpreting images, it is very important to get the parameters of arc. Under the condition of intersection and touching, it is difficult to compute center coordinates and radius of circles, that is because the proper seed points cannot be found out, and it is more difficult to compute parameters(center coordinates, radius, start angle, end angle) of partial circles. The existing methods cannot deal with the cases of intersection and touching of graphic elements. So this paper presents a geometric properties-based Sym CAve(Sym CAve is the abbreviation of Symmetry axis, Circle fitting, and Average Distribution Point) arc segmentation algorithm.The fitting-based method for computing the parameters is used, firstly, two auxiliary concentric circles are used to compute seed points on target arc, three strategies are used to remove the noise points created by intersection and touching. Environment information(symmetry axis) is employed to adjust the parameters. The average distribution auxiliary points computed by circle’s parameters are used to judge that the target arc is a circle or partial circle. If it is a partial circle, more auxiliary points are distributed around the two ends of partial circle in order to get accurate start and end angles. This method can be generalized to segment ellipse and parabola. Experiments use the standard evaluation data provided by arc segmentation contest of international association of pattern recognition, algorithm performances are analyzed by standard evaluation tool provided by the contest. The results show that Sym CAve algorithm is better than the existing algorithms.The algorithms of ellipse and parabola segmentation are tested by dataset, and the results are promising.There are many symbols in engineering drawings and the diversities on shapes and sizes of symbols are very obvious. Some of them are rotated and added noise. The structure-based methods are affected by noise in the process of vectorization, while the statistical-based methods cannot handle the cases of rotation. So this thesis proposes a key points-based statistical integration of constraint histogram feature extraction method,and a features-based multi-graph semi-supervised engineering drawing symbol recognition method. In this symbol recognition method, there are three features, viz, key points-based statistical integration of constraint histogram feature proposed by this paper, Zernike moment, and Tchebichef moment. Key points-based statistical integration of constraint histogram feature has the advantage of structural and statistical methods.Zernike and Tchebichef moments are powerful descriptors and robust in recognizing rotation and scale symbols. The experimental datasets include the data used on symbol recognition contest and a open logo library of Maryland University. First, there are sixteen moments which are used to compare their performances for recognizing symbols and logos. The results show that Tchebichef and Zernike moments are the most suitable tools to describe symbols in sixteen moments. Then the relation between the different orders and recognition rates are studied, when the orders of moment reach a critical value, the recognition rates do not raise again. Finally the performances of features-based multigraph semi-supervised engineering drawing symbol recognition method are analyzed, its recognition rate is ten percent higher than the rate of the approach based on moments.When three feature extraction methods are employed, the efficiency is lower, so two fast moment computation methods proposed are faster than the original methods.
Keywords/Search Tags:document image, graphic recognition, symbol recognition, arc segmentation, multi-features
PDF Full Text Request
Related items