A Research On Text Modeling Algorithm Based On Deep Neural Network

Posted on:2021-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:J F Xiang

Full Text:PDF

GTID:2428330623967951

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

The comprehensive development of the Internet and the Internet of things(IoT)has accelerated the entry of human beings into the era of artificial intelligence(AI),and a large amount of text data has been generated on various terminal devices.Therefore,the informatization of text requires a good representation algorithm of these texts.Text representation is extracted from the text,which learn the text semantic information,and transform the text representation into vectors what computer can operate numerically.It have been seen as the key point of all the downstream tasks in the field of natural language processing,such as text classification,information extraction,machine translation,question answering,thus to cause the attention of many scholars in recent years.Text mainly has three existing forms: word,sentence and document.The existing text representation methods mainly include two field of methodology: traditional text representation algorithm and text representation algorithm based on neural network.Based on the deep neural network,this paper constructs three lightweight text representation algorithms for different languages and fields,and constructs an end-to-end model based on the downstream task of text classification,as follows:1)based on English corpus,this paper explores the enhancement brought by character subword information to text representation.Based on CNN network,the character information of words is introduced,and the pooling operation in the convolutional network is optimized into self-attention network,and the character-based hierarchical Attention convolution model(E-HAC)is constructed.The experiment was carried out on 6 common text classification data sets,and the accuracy was improved compared with the performance of baseline CNN model,especially on the MR dataset by nearly 2 percentage points.2)focusing on the specific fields of Chinese,this paper explores the gains brought by the information of stroke subwords to text representation.Based on CNN network,the stroke information of words is introduced,and the pooling operation in the convolutional network is optimized into the self-attention mechanism,and a stroke-based hierarchical attention convolution model(C-HAC)is established.The accuracy of the classification data set of legal consulting professional questions we built was 4percentage points better than the baseline CNN model.3)based on the existing GRU structure and inspired by second-order Taylor exhibition and self-attention,the existing GRU structure was improved and the attention-based 2rd-GRU model was constructed.Compared with the baseline attention-based GRU model,the accuracy of Chinese specific legal classification task has improved by about 3 percentage points.

Keywords/Search Tags:

Text Modeling, Hierarchical Attention Convolution(HAC) Model, Characters, Strokes, 2rd-GRU Model

PDF Full Text Request

Related items

1	Study On Hierarchical Modeling Technique Of Virtual Environment For Behavior Planning Of Autonomous Characters
2	Strokes Extraction Of Off-line Handwritten Chinese Characters
3	Research On Multilingual Text Recognition In Complex Scenes And System Design
4	The Design Of Steganographic Method For Text Without Carrier Based On The Characteristics Of Chinese Characters
5	Research On Volume Modeling And Computation Of Hierarchical TIN Model Based On Big Data
6	Research And Implementation Of Text Classification Based On ERNIE And TextGCN
7	Research On Text Emotion Analysis Based On BiTCN And Pre-training
8	Study On Hierarchical Attention Network Model Based On Reinforcement Learning And Text Sentiment Classification
9	Design And Implementation Of Model Simulation In Hierarchical Modeling System
10	Short Text Classification Algorithm Based On Temporal Convolution And Attention Mechanism