Font Size: a A A

Short Text Classification Based On Non-negative Matrix Factorization And Deep Learning

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:M T HuangFull Text:PDF
GTID:2428330611967597Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet has become a necessity in people's lives,carrying hundreds of millions of data transmissions.Users can share their information anytime,anywhere while enjoying the convenience and benefits brought by the network.Short text is a fast way to transfer information.The business value,sentiment and the trend of events mined from these data will become a weapon for decision makers.This thesis combines non-negative matrix factorization and deep learning models to study short text classification methods.The main work is as follows:1.A non-negative matrix three factorization algorithm based on manifold regularization(MNMTF)is proposed.The sparseness of text features due to the limited number of words in short text,and an unbalanced growth trend between short text and words.In order to solve these problems and reduce the unstable interference caused by random initialization to the experimental results,a two-stage matrix factorization is used to solve the clustering indicator matrix.By decomposing the dense association matrix,the negative impact of data imbalance on the results is reduced.When the relationship matrix is decomposed,manifold regularization is added to solve the problem of short text features sparseness.2.Based on the improved non-negative matrix factorization algorithm,a short text feature extension method(NMFFE)is proposed.Feature extension is used to increase the text length of the original data to solve the problem of sparse features in short text.First,the word-category feature space is obtained through the MNMTF algorithm.Then we calculate the correlation between the features according to the feature space,and add the strongly related features to the short text.In addition,the updating of data will have an impact on the timeliness of keywords,so the update rules of the word-category feature space are proposed to ensure that new features are not missed.Short text classification experiments were conducted on three public datasets.The results show that: in terms of enriching text features and locating keywords with strong category associat ions,feature expansion methods based on non-negative matrix decomposition can improve short text classification performance.3.Two short text classification models based on feature embedding are proposed.Obtaining rich information from multiple levels of short text has been the trend of text representation research.First,the feature association is calculated according to the word-category feature space to obtain local category information and global category information,and these information are fused into sentence granular category features.Then,the long short-term memory network is used to extract contextual semantic information.The hidden layer output and sentence granular category features are fused into a multi-granular feature representation.Short text classification experiments using two classification models on public datasets have verified the effectiveness of short text classification methods based on multi-granular feature representation in improving classification performance.4.By constructing auxiliary sentences,extended information is fused into the text.Convert single-sentence classification tasks into sentence-pair classification tasks,and fine-tune the pre-trained BERT model.By comparing the experimental results of single sentence classification and sentence pair classification,the effectiveness of sentence pair classification task by constructing auxiliary sentences in improving the classification performance of BERT model is proved.
Keywords/Search Tags:Short Text Classification, Non-negative Matrix Factorization, Long Short-Term Memory Network, Feature Embedding, Auxiliary Sentence
PDF Full Text Request
Related items