Font Size: a A A

A Study Of Automatic Form-Filling Based On CNN And BiLSTM-CRF

Posted on:2022-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:S Z WuFull Text:PDF
GTID:2518306770470074Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The research on the theory,method,and system of automated form-filling is a pressing need of the times because the daily office activities often need a large amount of form-filling,which is time-consuming inefficient,and error-prone.Thus,in his keynote speeches at the 19 th General Assembly of the Chinese Academy of Sciences and the 14 th General Assembly of the Chinese Academy of Engineering,President Jinping Xi made it clear that we should not let form-filling waste the energy of our scientists.To respond to President Xi Jinping's call,we develop an automated form-filling system in this thesis.This is feasible because much academic information of scientists is available online or restored in e-media.On the other hand,IT and AI technologies make it possible to collect,recognise,and extract information from the Internet,which is necessary for filling most academic-oriented forms.Specifically,the main contributions of this thesis are as follows:(1)A data set that can be used to study personal information classification and mining is established.At present,there are few researchers in personal information classification and mining,and there is no public data set.This paper crawls more than 20,000 data from Baidu Encyclopedia,scholar network,Baidu academic,an other websites and manually labels these data.(2)Because the word vector-based classification method can not effectively capture the relationship between words,this paper proposes a word-based convolution neural network method,which is represented by the pre-trained word vector.Through experiments,RNN,RCNN,Fast Text,and Text CNN based on word vector and word vector-based models are selected for experimental comparison,and different parameters are used for training.The experimental results show that the convolution network model based on the char vector proposed in this paper achieves better results.(3)Since the information obtained by direct classification is still the information in the network,which belongs to the user input information,this information is more or less redundant or wrong.Moreover,there are differences in filling styles among users and data formats,which is not conducive to further information mining.Thus,fine-grained entity extraction using a pre-trained word vector-based bidirectional Long Short-Term Memory Network combined with conditional random fields.In the comparative experiments,common RNN,LSTM,and Bi LSTM models are used for comparison.The experimental results show that the bidirectional Long Short-Term Memory Network based on the pre-trained word vector combined with the conditional random field model can achieve better performance.(4)An automatic form-filling system is built with a personal information classification and extraction model,which is used for personal automatic form filling,which effectively improves office efficiency.All data can be corrected in this automatic form filling system to obtain more accurate data.In addition,the data can be used for incremental training so that the model has a better generalisation effect.
Keywords/Search Tags:Text Classification, Named Entity Recognition, Deep Learning, Knowledge Graph, Automatic Form-Filling
PDF Full Text Request
Related items