Font Size: a A A

Improvement Of Frame Detection Algorithm In Table Recognition

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:S L HaoFull Text:PDF
GTID:2428330647960997Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,more and more corporate human resources have begun to implement thesis less management.Personal information and salary of employees will be entered into the computer in the form of data flow.However,it involves some inter-agency collaboration services,and subject to restrictions such as confidentiality,employees and companies will print their information in thesis form for business collaboration.After receiving the business form,the organization needs to enter the information of the form into the internal information system of the enterprise.The entry work has been performed manually in the past.However,due to the continuous increase in the number of businesses in recent years,the manual entry is obviously unable to meet the timeliness of the business.Requirements,so automatic entry of thesis form documents is increasingly important.The detection of forms and the identification of information are particularly important.We need a more effective method to test the table,and the results should be improved.In this thesis,based on the previous literature on the detection and identification of tables,the research on the detection and reduction of table borders is carried out.This article introduces the general process of table recognition,that is,the detection of table borders,table recognition and table information extraction.In the detection of table frame lines,this thesis analyzes the original Huffman frame line detection algorithm,and finds that the Huffman algorithm process has a large amount of calculation,and the calculation efficiency is low when there are many image pixels.The shortcomings of the high time and space complexity of the Mann transform.At the same time,the author also analyzes the traditional directed single connected chain algorithm and finds that the search and merge algorithm of the directed single connected chain in the traditional method has a high complexity.This thesis proposes an improved directed single connected chain algorithm and a methodfor finding the chain In the above,we use the black pixels in the upper left corner of the table,analyze the situation in 6 to determine whether to continue the traversal,and filter the searched chains.The ones with higher slopes and chain lengths less than3 are filtered out,which must be to a certain extent,it reduces the amount of calculations to be combined later.In the merging of directed connected chains,the author adopts the direct calculation of the average value of the ordinate of the center point of the run in two single connected chains to determine the distance between the same line,which effectively reduces the original calculation amount.In the processing of the false line of the image detection frame line,this thesis puts forward the evaluation standard of the straight line quality Q,and the traditional method of removing the false line proposes the block comparison method.The core idea of the block comparison method is to compare the table frame and line graph obtained by merging the original images of a single connection chain and the table.Because the threshold needs to be set by yourself,and there is artificial interference in it,this kind of culling cannot eliminate smaller pseudo straight lines.Contrast this article,after proposing the quality evaluation standard,keep the straight line with Q greater than 0.8,and generally lower than this value are the pseudo straight lines left by the text.After researching and analyzing the current table restoration technology,since the tables are mostly black and white images and the structure is clear,this thesis decided to use the similarity algorithm based on perceptual hash in the image similarity matching method to restore the table structure.Finally,in the removal of the table frame lines,this article introduces the intersection of the three word lines and the character trimming after the separation of the word lines.Finally,in the experimental results,after applying the straight line detection method in this thesis,the time and accuracy of straight line detection have been improved to a certain extent.
Keywords/Search Tags:Table border detection, Form recognition, Frame removal, Table restore
PDF Full Text Request
Related items