| As an efficient form of data organization and presentation,tables are widely used in production life.Nowadays,table data occupies half of the data handled by most practitioners in their daily offices,such as the annual financial statements of listed companies,the case files of local courts,the registration forms of national examinations,etc.There is a huge amount of table data to be handled in various documents,which shows that the importance of efficient handling of table data is undoubted,and therefore table recognition technology is receiving more and more focus.Many scholars have done a lot of researches on table recognition techniques.In these researches,we found that the performance of table recognition is greatly disturbed by the local structure of data and environmental factors,and the accuracy and generalization of existing algorithms still need to be improved.Secondly,the current popular table recognition algorithms all involve the structure recognition of tables,and the table structure recognition often uses the relationship information between cells and is computationally intensive.Based on the above-mentioned issues,the following work has been attempted in this paper:(1)In this paper,we first address the problem of angular distortion and directional rotation of table data in real shooting scenes,and add a directional recognition network to the deep learning-based table detection algorithm,achieve accurate recognition of table direction while detecting and locating table positions.Then the structure of the detection algorithm is then optimized using a path aggregation network,and the loss function is improved to enhance the recognition effect of the model on the specific data presentation form of the table.Finally,relevant comparison experiments and ablation experiments are set up,and the results prove that the optimization scheme is feasible.(2)In this paper,we use text detection and text recognition algorithms to locate and recognize the table contents based on the detected table positions.The pre-trained text detection and text recognition models are first trained by fine-tuning them separately on the XFUND dataset using a knowledge distillation strategy,and an additional collection of about 260,000 generic data based on real scenes is added to the text recognition model for optimization.The optimization is experimentally demonstrated to significantly improve the robustness and generalization of the algorithm.Finally,in order to structure the table contents for output with low model complexity and computational effort,this paper uses semantic entity recognition and relationship extraction to replace the traditional table structure recognition task,and finally exports the table data to an Excel document completely.(3)This paper designs and implements a visual table detection and information extraction system.The system ties together the table detection task,text detection and text recognition task,semantic entity recognition and question and answer matching task involved in this paper.The system mainly contains two main functional modules of form detection and form recognition,and two additional functional modules of specified content recognition and format conversion have been designed,and the actual effect of each function can be seen intuitively through the system interface,this system has a certain application value. |