Font Size: a A A

Research On Chinese Short Text Representation And Classification

Posted on:2022-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:M HaoFull Text:PDF
GTID:1488306605475234Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the wide application of Internet information technology and the popularity of mobile intelligent terminals,the short text has become an important source of information in people's daily lives and an important carrier of information transmission in many Internet applications.In this context,we can easily obtain massive,multi-source short text data,including retrieval fragments,product reviews,news headlines,and so on.However,how to quickly and accurately mine important information from massive data according to the personalized needs of society and users is still facing great challenges.The research of short text classification technology can help the system "understand" and"manage" all kinds of short text data more efficiently,which plays an important role in promoting the development of social intelligence.At present,short text representation technology based on deep learning can express short text as a low dimensional and dense feature vector,which solves the problems of traditional representation methods such as "high dimension","strong sparsity" and "semantic gap".However,it still encounters some problems such as less context content,high semantic density,and strong ambiguity.Therefore,this paper focuses on the characteristics of Chinese short texts.For the needs of massive short text processing in specific fields,it researches the representation and classification of Chinese short text based on deep learning and applies it to topic classification,emotion analysis,and so on.The main contributions of this paper are as follows:1.Convolutional Attention mechanism based single feature short text representation and classification.How to get a fixed dimension text representation vector from high-level semantic features is a hot topic in text representation and classification.Due to the short context length and dense semantic information of short text,the soft attention mechanism which performs well in long text classification does not work well in short text scenes.Therefore,this paper proposes a short text representation and classification method based on a convolutional attention mechanism.This method uses a convolutional neural network to capture high-level semantic features,and at the same time,it also effectively combines temporal information to adaptively generate more accurate and reliable attention weight,to effectively improve the classification accuracy of the model.Experimental results show that this method can effectively improve the accuracy of Chinese short text classification and achieve good results in the task of English long and short text classification.2.Multi-feature integration based short text representation and classification with mutual attention convolution neural network.In the existing short text classification models based on multi-scale feature fusion,when integrating features,most of them perform a maximum pooling operation on the features and directly concatenate them.This type of method not only ignores the problem of different semantic spaces of features at different scales,but also loses a large amount of semantic feature information,resulting in limited classification accuracy.In order to solve the above problems,this paper proposes a framework based on mutual attention-convolutional neural network,which maps the character and word scale high-level semantic features to the same semantic space by setting a trainable mutual attention matrix,and uses three-dimensional convolutional neural The network integrates it.Experimental results on six public Chinese short text data sets show that this method can effectively reduce the loss of semantic information during multi-scale feature integration,and effectively improve the model's classification accuracy of Chinese short texts.3.Propose a short text fuzzy sample classification method based on label embedding.In actual classification tasks,texts of many unpredictable categories are often generated due to unreasonable category settings,which we call "fuzzy samples".A variety of deep learning-based methods solve such problems by enhancing the feature extraction capabilities of the model,but the effect is not ideal.Therefore,a short text fuzzy sample classification method based on label embedding is proposed here.This method publishes label embeddings to indicate categories,so that the similarity between categories can be replaced;by training the model,the sample is gradually close to its true label and far away from its wrong label.To achieve this goal,this paper proposes an opposite-ternary joint loss function to train the model.Experimental results show that our method can improve the classification accuracy of multiple models.
Keywords/Search Tags:Short Text Representation, Short Text Classification, Convolutional Attention, Mutual Attention, Label Embedding
PDF Full Text Request
Related items