Font Size: a A A

Research On Multi-label Classification Method Of Chinese Short Text Based On Multi-dimensional Feature Fusion

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:P F GuoFull Text:PDF
GTID:2518306749971779Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In order to accurately identify the user's intention and help the user quickly find the target information,it is necessary to use a limited number of tags to classify short texts on the premise of expressing the full meaning of the target information as completely as possible.Chinese short texts are usually characterized by short length,nonstandard expression and diverse contents,which can't be effectively classified by traditional text single-label classification algorithm.Therefore,multi-label classification algorithm for Chinese short texts has always been the key research direction of scholars.Based on the existing text multi-tag methods,this thesis improves the text representation method and feature extraction strategy,and proposes a Chinese short text multi-tag classification model CRC-MHA based on multi-dimensional feature fusion.The main research contents include:(1)Compare various text representation methods,including Word2 vec,BERT word vector and BERT sentence vector,etc.The experimental results show that,compared with using a single text representation method,the feature vectors embedded with multiple dynamic words can learn more comprehensive text semantic feature information by making full use of the text feature representation ability learned by the pre-training language model on massive text data sets.(2)In the model feature extraction layer,a feature extraction strategy is designed,which combines CNN,RCNN and other feature extraction models with multi-head self-attention mechanism according to the parallel strategy.Combining the advantages of multi-head self-attention mechanism and Bi-LSTM to extract global key features,and CNN's ability to capture local features of text,it integrates multi-dimensional feature information to represent semantic features of sentences,so as to obtain better classification effect.There are two innovations in this thesis: First,BERT model and full-word mask technology are used to embed dynamic words in the text presentation layer,and the generated word vectors and sentence vectors are fused with multi-dimensional features,so that the contextual semantics of the text can be better represented by the advantages of massive pre-trained texts;Secondly,in the feature extraction layer,a parallel feature extraction strategy combining CNN,RCNN and multi-head self-attention mechanism is designed to enhance the capture of key features in short texts to improve the classification effect.The experimental results show that CRC-MHA model improves the weighted F1 evaluation index by 2.07%,0.54% and 0.46% respectively compared with BERT,BERT-CNN and BERT-RCNN models,which proves that the innovation proposed in this paper can improve the classification effect of the model and verify the effectiveness of the model.
Keywords/Search Tags:Multi-label classification, Feature fusion, Dynamic word embedding
PDF Full Text Request
Related items