Font Size: a A A

Research And Implementation Of Off-line Mathematical Formula Recognition And Cutting Technology Based On Faster-Rcnn

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:M X YangFull Text:PDF
GTID:2428330596476506Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has developed rapidly,experts and scholars have increasely invested in this field,which has made the field develop rapidly.Various applications have emerged in this field.Among them,computer vision is the forefront of artificial intelligence.In the task of computer vision,automatically extracting and understanding the mathematical formulas presented in the document has always been a problem.So,it is a meaningful and important task to develop a mathematical formula recogniton and cutting system for automating the processing of mathematical documents.The main subject of this paper is the recognition and cutting of offline mathematical formulas.Offline highlights that the data sets we collect are pure pixel images.It doesn't have spatial data and time data like online data,so it is more difficult to identify.The traditional identification methods are based on manual extraction of formula features.In general,this can only deal with limited data conditions.For data sets such as offline data,it is necessary to consider the illumination,the direction of the handwriting,the complex environment around the handwriting,etc.These situations are not handled by traditional methods.In addition to considering environmental issues,we must also consider the characteristics of the mathematical formula itself.The mathematical formula is different from the natural language characters.Natural language characters are generally linear structure.In the recognition process,we only need to consider the one-dimensional direction,but for mathematical formula,It includes the upper and lower structure of the fraction and the semi-enclosed structure of the radical sign.This requires us not only to correctly recognize the mathematical symbols,but also to correctly analyze the structural relationship between the mathematical formulas.This is also a difficulty problem in the field.This paper focuses on the recognition and cutting of handwritten mathematics formulas in the field of elementary mathematics education in real scenes.We propose a mathematical formula recognition and cutting system based on Faster-Rcnn.We also illustrate how to manually label the collected data sets and train Faster-Rcnn network and present the training parameters and training effects.Based on the original Faster-Rcnn network,we propose a pre-training model enhancement method,which can improve the accuracy of the system by 1.9%.This method uses the encode-decode architecture to carry out model pre-training,it makes the network in the process of migration learning apply to the recognition and cutting of mathematical formulas faster and better.What's more,we added a CNN-based regression model to the final stage of the system to help correct the missing formula information of the system and increase the recognition accuracy of the system.Finally,we use the coordinate information detected by the system to fulfill the recognition and cutting of the mathematical formula in the data set.At the end of the paper,we give an analysis of the test results under the real data set.In the test set,it composed of 1600 randomly selected students' answering images,including 8435 formulas.the accuracy of the formula recognition rate of 87.8%was achieved and the cutting accuracy of 91.4%was achieved.
Keywords/Search Tags:Faster-Rcnn, Formula cutting, Formula recognition
PDF Full Text Request
Related items