Font Size: a A A

Research On The Feature Selection Method Of News Text Based On Cloud Model

Posted on:2019-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:L SongFull Text:PDF
GTID:2438330548457807Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,texts on the Internet have been paving the way for the general public at an exponential rate.Therefore,how to improve the efficiency and accuracy of news text categorization,improve the classification of high-quality and intelligent news texts,and quickly extract the information services required by users are of great significance.Feature selection is one of the main methods for the classification of news texts.Commonly used feature selection methods consider the relationship not very comprehensive between feature words and categories,and defaults to a balanced data set.Existing text feature selection methods often have certain flaws.The diversity,complexity,and uncertainty of the news text itself,coupled with the increase of hot words,and the size of the feature subset space are not easy to determine,which has brought certain difficulties to the study of feature selection.This article addresses the problem of unbalanced features and samples in news texts.Do the following aspects of research work.The related theories and techniques of news text classification are studied.Understand several commonly used feature selection methods and compare their advantages and disadvantages and applicable conditions.In view of the deficiencies of common feature selection techniques and the uncertainty of feature items,this subject applied the knowledge of fuzzy set theory at the granularity level of feature items,improved the expected cross-entropy feature selection method,and proposed AFECE feature selection method.Through the three commonly used feature selection techniques and AFECE feature selection techniques,the same classifier model is used for simulation.By comparing and analyzing the three evaluation index values,it can be seen that the proposed method is effective.The technique based on parameter optimization was studied.The feature selection methods mentioned above are all faced with the problem that it is difficult to determine the k value of the feature subset space size.In order to solve this problem,this chapter added particle swarm technology to optimize parameters.This paper uses the stability and randomness of the cloud model to solve the problem of premature convergence of the traditional particle swarm optimization.The problem of data imbalance was studied.Although the improved feature selection method can select the optimal feature subset,the classification performanceis not ideal for the data imbalance,sparse,and edge data samples.To solve this problem,considering the fuzzy assignment of the K value and weakening these interferences based on the KNN classifier model,an AFKNN classifier is proposed.Simulation experiments show that the feature subsets selected by the AFECE feature selection method have higher classification performance for the performance of the classification model.The particle swarm optimization method based on the cloud model requires less feature dimensions to achieve better classification results;In order to solve the problem of unbalanced texts and small data,when the feature dimension is 100 dimensions,the accuracy of the AFKNN classifier is improved by about 3% compared with the conventional classifier.
Keywords/Search Tags:text classification, AFECE feature selection, CPSO, AFKNN classification model
PDF Full Text Request
Related items