Font Size: a A A

Research On Chinese Short Text Classification Based On Hybrid Neural Network

Posted on:2020-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330572961744Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the large-scale popularization of the Internet and the rapid increase of the number of people on the Internet,the number of short texts produced on the network every day increases exponentially.This kind of semi-structured or unstructured Internet text information has the characteristics of sparsity,irregularity and the emergence of catchwords.As one of the key technologies of information processing,Internet short text classification has made great progress in the field of information retrieval and knowledge mining.In order to improve the accuracy of Chinese short text classification and solve the problem of sparse text representation,a Chinese short text classification method based on hybrid neural network is proposed.Firstly,a self-defined feature word filtering mechanism is used to filter document feature words at phrase and character levels.Using convolution neural network and recurrent neural network to extract the high-order vector features of documents,the attention mechanism is introduced to optimize the high-order vector features.Finally,a hierarchical classifier is used to classify the features.The experimental results show that the model can not only extract the features of phrase and character layers of documents,but also solve the problem of sparse text representation.The main work and innovations of this paper are as follows:(1)To solve the problem of ambiguity after Chinese word segmentation,a self-defined feature word selection mechanism is proposed.This method constructs a high-quality global dictionary by manually filtering and combining the whole data set under a certain category with network information.This dictionary contains all the filtered high-quality phrases under this category.The criterion of selection is to artificially judge and combine the phrases provided by network information with the phrases with high relevance to this category.Finally,each text under this category is used.Linear representation of global dictionaries.(2)In view of the fact that traditional feature representation methods can not really represent the semantic features of text,a high-order feature extraction network combining convolution neural network and recurrent neural network is proposed.In order to further highlight the extracted high-order feature vectors,an attention mechanism method to optimize the high-order feature vectors is proposed.The optimized phrase layer and character layer vectors are merged as the final vector representation of the document.(3)Three neural network models,including CNN,LSTM and CLSTM,are selected as the baseline models of comparative experiments.The experimental results show that the proposed hybrid neural network Chinese short text classification method achieves better results than the model in both binary and multi-classification datasets.
Keywords/Search Tags:CNN, RNN, Short text classification, Text representation, Deep Learning
PDF Full Text Request
Related items