Font Size: a A A

A Two-stage Method For Off-line Handwritten Inorganic Chemical Equations Recognition

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2381330605461520Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Chemical equation is a form to express the relationship between chemical substances,which is crucial for human to explore the chemical world.As handwriting touch devices become more and more.popular,handwriting based user input is becoming more and more common.Therefore,how to quickly and accurately input handwritten chemical equations into elec-tronic devices has become a hot research direction in the field of pattern recognition.The main challenges of the research include:(1)large number of chemical symbol categories and many similar symbols;(2)complex chemical formula structure with two-dimensional struc-ture;(3)different people have different handwriting styles,with continuous and broken writ-ing.Current sequence2sequence models achieve promising results for isolated word-level recognition task,but perform worse for sentence-level recognition task with long length.For the research object of this project,chemical equation tends to be a very long sentence-level sequence.Based on this difficult challenge,in this thesis,we propose a two-stage method for off-line handwritten inorganic chemistry equations recognition,taking the advantages of both deep learning techniques and traditional segmentation algorithms.First,we use traditional image segmentation techniques,including connected-domain segmentation and projection segmentation,to perform coarse segmentation of chemical equations.Then we use the ResNet50 deep neural network framework to perform symbol classification tasks af-ter coarse segmentation.Symbol classification can help us find the segmentation position of chemical operators in the original image.The segmentation position of chemical operator is benchmarking for precise image segmentation.Finally,we use CNN+RNN+ CTC and CNN+RNN+Attention two different deep learning frameworks to recognize chemical formulas and chemical operator.Furthermore,the recognized images are combined into a complete chemical equation in sequence order.In the experiment,we collected our own data set of handwritten chemical equations.The segmentation algorithm adopted the enhanced connected domain segmentation method.This method achieves good segmentation effect and provides support for chemical formula recognition.The recognition algorithm adopted CNN+RNN+Attention framework.At the level of chemical equation,the whole two-stage method achieved highest recognition rate of 98.5%.
Keywords/Search Tags:chemical equation, neural network, image segmentation, deep learning, pat-tern recognition
PDF Full Text Request
Related items