Font Size: a A A

Design And Implementation Of Chinese Short Text Classification Method

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y F JinFull Text:PDF
GTID:2428330575967950Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,Internet has developed rapidly.Interbet information has shown a trend of rapid growth,mainly in the form of short text.How to find valuable information and classify it accurately has become the focus of scholars' attention.Short text has the characteristics of fewer words and higher dimensions,so the long text classification method is useless.To solve these problems,this paper studies the related technologies of short textclassification.Firstly,a short text feature extension algorithm STFE is designed to increase the effective features of short text and then to improve the accuracy of classification.Then CAS-CNN network structure is proposed,which introduces attention mechanism in the word vector layer to enrich the features of word vectors from different angles,so as to improve the classification effect.The concrete research work include three points:(1)A short text frequent feature word set mining algorithm SP-Apriori is proposed to solve the problem of low efficiency of Apriori algorithm mining frequent feature word set in single machine mode.This algorithm combines the advantages of Spark,reduces the execution time of SP-Apriori algorithm and improves the efficiency of mining frequent feature word sets.(2)A short text feature expansion algorithm STFE based on frequent feature words is proposed to alleviate the shortage of short text features.Firstly,SP-Apriori algorithm is used to mine frequent feature words in corpus and screen effective association rules.Secondly,some feature words are extended to short texts to increase the number of feature words in short texts,which adds feature information to the classification tasks.(3)A new network structure is designed and a short text classification model CAS-CNN based on convolutional network with attention is proposed.By introducing attention to weighting the initial word vector,the network can pay more attention to the important features to suppress invalid features,and enrich the feature expression of the word vector layer.Compared with other commonly used classification models,the F1 score of this method is improved.
Keywords/Search Tags:Chinese Short Text, Feature Extension, Frequent feature words Mining, Convolutional Neuron Network, Attention Mechanism
PDF Full Text Request
Related items