Research On Multi-label Classification Method Of Chinese Short Text Based On Multi-dimensional Feature Fusion

Posted on:2022-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:P F Guo

Full Text:PDF

GTID:2518306749971779

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In order to accurately identify the user's intention and help the user quickly find the target information,it is necessary to use a limited number of tags to classify short texts on the premise of expressing the full meaning of the target information as completely as possible.Chinese short texts are usually characterized by short length,nonstandard expression and diverse contents,which can't be effectively classified by traditional text single-label classification algorithm.Therefore,multi-label classification algorithm for Chinese short texts has always been the key research direction of scholars.Based on the existing text multi-tag methods,this thesis improves the text representation method and feature extraction strategy,and proposes a Chinese short text multi-tag classification model CRC-MHA based on multi-dimensional feature fusion.The main research contents include:(1)Compare various text representation methods,including Word2 vec,BERT word vector and BERT sentence vector,etc.The experimental results show that,compared with using a single text representation method,the feature vectors embedded with multiple dynamic words can learn more comprehensive text semantic feature information by making full use of the text feature representation ability learned by the pre-training language model on massive text data sets.(2)In the model feature extraction layer,a feature extraction strategy is designed,which combines CNN,RCNN and other feature extraction models with multi-head self-attention mechanism according to the parallel strategy.Combining the advantages of multi-head self-attention mechanism and Bi-LSTM to extract global key features,and CNN's ability to capture local features of text,it integrates multi-dimensional feature information to represent semantic features of sentences,so as to obtain better classification effect.There are two innovations in this thesis: First,BERT model and full-word mask technology are used to embed dynamic words in the text presentation layer,and the generated word vectors and sentence vectors are fused with multi-dimensional features,so that the contextual semantics of the text can be better represented by the advantages of massive pre-trained texts;Secondly,in the feature extraction layer,a parallel feature extraction strategy combining CNN,RCNN and multi-head self-attention mechanism is designed to enhance the capture of key features in short texts to improve the classification effect.The experimental results show that CRC-MHA model improves the weighted F1 evaluation index by 2.07%,0.54% and 0.46% respectively compared with BERT,BERT-CNN and BERT-RCNN models,which proves that the innovation proposed in this paper can improve the classification effect of the model and verify the effectiveness of the model.

Keywords/Search Tags:

Multi-label classification, Feature fusion, Dynamic word embedding

PDF Full Text Request

Related items

1	Research On The Multi-label Lassification Methods With The Label Embedding And Structure Information
2	Research On The Essential Technology Of Multi-Label Chinese Text Classification
3	Research On Classification Algorithm Based On Multi-label Learning
4	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
5	Research On Multi-label Image Classification Algorithm Combined With Dictionary Learning
6	Research On Word Spotting Technology In Handwritten Historical Document Images
7	Research On Joint Embedded Multi-label Classification Algorithm
8	Dynamic Weighting Of Word Embedding And Distributed Learning Strategies
9	Multi-label Classification Of Captioned Images Based On Deep Learning
10	User Feature Recognition Based On Spatio-temporal Word Embedding Of Trajectory