Research On Chinese Short Text Classification Based On Pre-trained Language Model

Posted on:2024-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Sui

Full Text:PDF

GTID:2568307154997389

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the booming development of the Internet,the amount of textual information has grown exponentially.In the era of big data,text has become the main source of access to information.The classification of textual information makes it easy to summarise and analyse large amounts of data and is one of the important tasks in natural language processing.Most of the data obtained from everyday life is short textual data.Since short texts are characterised by high noise,high sparsity,huge volume,and little contextual content,it is extremely important for today’s fast and intelligent society to accurately and quickly extract the important information from the huge amount of data according to the user’s needs.At present,with the continuous development of neural networks,text representation methods are mainly based on the traditional Word2 Vec word vector model and the emerging BERT pre-trained language model to represent words in the form of vectors,which can be good for vectorized representation of text,but cannot solve the problems of ambiguity,poor semantics and sparse features of Chinese short texts.This thesis addresses the above-mentioned problems and investigates text classification models based on deep learning,and the main research content of this thesis is as follows:(1)To address the issue of existing models not considering both overall and local features of text,this thesis introduces a mixed pooling method that combines mean pooling and max pooling in mainstream text classification models.Three mixed pooling methods were compared in this thesis to improve model performance,mainly including summation mixed pooling,cascade mixed pooling,and proportional mixed pooling.(2)To address the problem of Chinese text ambiguity,this thesis proposes a hybrid Bi LSTM-Mix neural network model based on the ERNIE model.First,the ERNIE model is used to vectorization the text data,generate dynamic word vector representation,and obtain more abundant semantic feature information.Then,the context semantic features are obtained through the Bi-directional Long Short-Term Memory network model,Then,a mixed pooling method combining max pooling and mean pooling is used to extract text features twice,in order to obtain more accurate feature information.(3)In order to address the problem of sparse Chinese short text features,this thesis firstly establishes two single-channel models,Ro BERTa-DPCNN model and Ro BERTaBi GRU-Mix model,and on this basis proposes a dual-channel neural network model based on the Ro BERTa model,using the Ro BERTa model to make a more contextual semantic representation of the text,using the Bi GRU-Mix channel for the secondary extraction of contextual global features,the use of DPCNN channel to obtain deep local features,and finally the global features and local features are spliced.The experimental results show that the proposed dual-channel model has the best performance in the Chinese short text classification task compared with the traditional neural network model and the singlechannel model established in this thesis.

Keywords/Search Tags:

Chinese short text classification, Text representation, ERNIE, RoBERTa, Mixed pooling

PDF Full Text Request

Related items

1	The Research And Implementation Of Chinese Short-text Representation And Classification
2	Research On Chinese Short Text Classification Based On Hybrid Neural Network
3	Research On Chinese Short Text Representation And Classification
4	Research And Application Of Chinese Short Text Classification Algorithm Based On Deep Learning
5	Research On Chinese Short Text Classification Based On Word Embedding
6	Research On Key Techniques Of Short-text Representation And Classification Based On Hybrid Semantic
7	Research On Classification Method On Chinese Short Texts With Few Words Based On Feature Representation
8	The Research And Implementation On Chinese Short Text Classification Technology
9	Emotional Classification Of Film Reviews Based On ERNIE Model
10	Text Representation And Algorithms For Chinese Text Classification