Font Size: a A A

Research Of News Text Classification Based On Multi-Model Fusion

Posted on:2020-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y L JinFull Text:PDF
GTID:2428330590971750Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text classification is a technique to classify unclassified texts by training a model.At present,it has achieved some achievements in many applications,but mainly focuses on short texts such as E-mail and Weibo.But for long texts like news,the classification effect is still unsatisfactory,which needs further research.Although the existing classification methods can improve the accuracy of classification,there are still problems of high dimension and sparse features.To slove the above problems,a feature selection algorithm based on three-way decisions is proposed in this thesis.Under the premise of ensuring classification performance,increased the investigate dimension to readuce the number of feature words and improve the problem of high-dimensional and sparse features.Meanwhile this thesis designs a text semantic generation model(TR-CNN)based on multi-model fusion,which effectively enhance the semantic generation of deep-learning model in the long text field.The main contents are as follows:1.Aiming at the problem of high dimensional and sparse features in the traditional feature selection algorithm,a three-way decision feature selection algorithm is proposed.Firstly,the traditional feature selection algorithm is systematically analyzed and studied.It is found that the traditional feature selection algorithm is relatively single and one-sided in weighting feature words;secondly,combined with three-way decisions,feature words are filtered by double decision function voting,and the feature words in sample space are divided into positive domain,boundary domain and negative domain;then,the feature words in the boundary domain are further processed to determine the final feature set;finally,experiments on THUCNews dataset show that the proposed method can improve the quality of feature words and reduce the number of feature words.2.Aiming at the problem of depth-learning model does not perform well on long text classification tasks,a new generation method based on multi-model fusion is proposed in this thesis to generate text semantic vector.Since the information transfer mode of the hidden layer of Recurrent Neural Network is similar to that of the human brain,the Transformer model has better grasp of the global text semantic information than other deep-learning models.Therefore,this thesis combines convolutional neural network and recurrent neural network to generate the local text semantics vector,and uses the Transformer model to generate the global semantics vector,and then joined the local semantics vector and the global semantics vector together to construct a new semanticvector.In this paper,a text semantic vector generation method based on multi-model fusion is designed and tested on the THUCNews dataset to verify its effectiveness.
Keywords/Search Tags:three-way decision, feature selection, multi-model fusion, text classification, semantic vector
PDF Full Text Request
Related items