Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning

Posted on:2021-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:F M Li

Full Text:PDF

GTID:2428330623478258

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With the substantial improvement of computing power and the rapid development of Internet technology,human research on natural language processing is continuously deepened.In this context,corpus linguistics has gradually grown up.Especially in recent years,with the rise of deep learning,the corpus has become the basis for the effective operation of neural network algorithms.In the field of natural language processing,part-of-speech tagging is a basic link to achieve the goal of natural language processing tasks,and it is also a preprocessing process for text data.Its accuracy will greatly affect the performance of subsequent target tasks.The higher the accuracy of part-of-speech tagging of the corpus,the larger the size of the corpus,the better the performance of the neural network model.Therefore,the construction and research of the part-of-speech tagging corpus has gradually become the research hotspot of scholars at home and abroad.As a basic discipline of the Natural Science,mathematics is closely related to the development of various industries.At present,there is no special corpus of mathematics with part-of-speech tagging at home and abroad,which seriously affects the implementation of machine translation of mathematics literature and other natural language tasks.Therefore,this paper focuses on part-of-speech tagging and constructs a corpus of part-of-speech tagging with a certain scale for Mathematical scientific literature data.This paper designs an algorithm for constructing a corpus of part-of-speech tagging for mathematics majors.Firstly,we combine neural networks and conditional random fields to build a neural network framework.Secondly,we use news part-of-speech tagging corpus data,and constantly add mathematica l data to the training set,test set,and verification set,while removing news data of the same sentence number,and then use the mixed data of news and mathematics to train new models.Finally,we get a model that is more efficient for tagging part-of-speech of mathematical data and a corpus of part-of-speech tagging for mathematics after multiple iterations of the neural network model.The accuracy rate of the corpus is98.36%,and the accuracy rate of the existing news part-of-speech tagging corpus is between 94%-98%^[12].It can be seen that the accuracy rate of the part-of-speech tagging corpus we build is very high.Based on this corpus,we can perform other tasks of natural language processing.This paper uses the model generated during the training to conduct test experiments on pure mathematical test data.The experimental results show that as the model is continuously optimized,the data distribution learned by the newly generated model gradually changes from the distribution of news data to the d istribution of mathematical data.The decoding efficiency on test data is also getting higher and higher,until the proportion of all data in the correct sentence decoded by the model does not change,and the optimal model is finally obtained.When the opt imal model decodes pure mathematical data,its decoding efficiency is 69.85%（in sentence units）,which is much higher than that of a model trained on pure news data（12.82%）.It can be seen that the optimal model obtained by us has learned the distributio n of mathematical data and used it to label the original corpus of mathematical professional literature.we can obtain the standard part-of-speech tagging corpus data through specific threshold screening,thus avoiding the disadvantages of high cost and low efficiency of manual tagging.In addition,the algorithm we designed in the article to build a corpus of part-of-speech tagging for mathematics literature also provides a reference for the construction of corpora of scientific and technological literature in other disciplines.

Keywords/Search Tags:

Natural Language Processing, Part-of-speech tagging, Corpus, Deep learning

PDF Full Text Request

Related items

1	Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus
2	The Study And Analysis Of Oracle Bone Inscriptions Based On Statistical Natural Language Processing
3	Chinese Word Found Its Part Of Speech Tagging
4	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
5	The Development Of Part-of-speech Tagging Software For Kazakh Language
6	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
7	Research On Text Document Information Hiding
8	Research On Lao Language Part-of-speech Tagging With Multiple Features
9	Research On Korean Text Representation And Sentiment Analysis Based On Deep Neural Network
10	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM