Font Size: a A A

Research On Parsing And Indexing Postscript Files Of Digital Newspaper

Posted on:2011-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:M C LuoFull Text:PDF
GTID:2178360308964147Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper mainly describes the key algorithms design and implementation of making digital newspapers with PostScript files. In which I will improve some algorithms, such as ray crossing method and migration method for polygon aggregation, aim to make the program of these algorithms more efficient. Also, I have proposed two algorithms, one is the algorithm of generating polygon for text block of PostScript file, the other is the algorithm of auto indexing regions, which are proposed for the first time.Here I will show how to generate polygon for text block of PostScript file. As PostScript files use coordinate to locate every word, we can take a text block as a point set, generating the polygon of point set means generating the polygon of text block. Graham scanning and ray crossing method will be used in this section. I will do some improvement for ray crossing method, by constructing new coordinate with the test point and simplifying the section of counting crossing point.The algorithm of auto indexing regions, is designed according to the features of regions in newspaper, such as font size, word count and position. In this section, we mainly concern five kinds of regions, including title, leading title, subtitle, body and author. Most of these regions can be recognized by this algorithm. It is the first that algorithm of auto indexing regions is proposed. It reduces the amount of labor on editing digital newspapers and periodicals, and makes it more effective.All the algorithms and functions elaborated in this paper, have been achieved in"digital newspaper"project, and proved to be accurate.
Keywords/Search Tags:PS file, Graham scanning, ray crossing algorithm, enveloping polygon, auto indexing
PDF Full Text Request
Related items