Font Size: a A A

Mathematical Formula Extraction In Printed-Chinese Documents Based On EEN Feature Function

Posted on:2018-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:C N HouFull Text:PDF
GTID:2348330539985371Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Facing the special needs of mathematical retrieval for the collection and sorting of scientific and technical literature which containing mathematical contents,a mathematical formula extraction method on printed-Chinese document images is studied.First,the EEN(Edge to Edge Notation)feature function which reflects the changing situation of connected components is defined,and corresponding algorithm which can extract the distribution of coordinates in the horizontal and vertical directions is designed.Second,the documents are preprocessed based on this feature function value,including noise removal and slant correction.Then,the features of the function that it can reflect the distributions of images in horizontal and vertical directions intuitively and adequately is utilized to realize the layout analysis on symbol level and the basic information extraction of text lines.Finally,combined with the layout features and content features of mathematical formulae to design the extraction algorithm of isolated formulae and embedded formulae which is suitable for the characteristics of printed-Chinese document images,and design the corresponding algorithm to merge the formula area.The experimental results show that the method can be used to discriminate the document layout components,and achieve the locating of isolated formulae and embedded formulae that in the document layout.
Keywords/Search Tags:Printed-Chinese document images, Connected components, EEN feature function, Isolated formula extraction, Embedded formula extraction
PDF Full Text Request
Related items