Font Size: a A A

Cost-sensitive Feature Selection Algorithm Based On Selfpaced Learning And Principal Component Analysis

Posted on:2022-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:C Q MaFull Text:PDF
GTID:2518306485486104Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the in-depth research and exploration of network technology,the application based on Internet has also got rapid development and progress in the current society.While the wide application of Internet of Things and artificial intelligence technologies brings convenience to human production and life,it also generates massive amounts of information and high-dimensional data.The processing and use of high-dimensional data increase computer memory consumption and significantly increases computing time.Outliers and noise points in the data sample will also interfere with the model training process and affect the model's accuracy.Therefore,it is of great significance to filter the data and improve the stability and robustness of the model.Feature selection eliminates redundant information in the sample,retains important features,and enhances the efficiency of subsequent classification or clustering tasks.Robust learning can increase the model's adaptability to abnormal samples,improve the model's anti-interference ability,and make the algorithm more stable.When collecting sample features,different degrees of time,manpower,and money are usually required in real life.We need to filter the key features appropriately.Secondly,if the internal distribution of the data used is imbalanced,that is,when the number of samples of different categories differs significantly,the classifier prefers data with a large number of categories,which reduces the reliability of the classification results.Therefore,this paper introduces cost-sensitive learning to solve the above problems and considers the cost problem caused by misclassification.Based on the feature selection algorithm,this paper introduces cost-sensitive learning to solve the problem of unbalanced distribution between classes and reduce misclassification costs.The self-paced learning framework and different forms of norms are used to constrain the feature weight matrix to improve the robustness of the model,reduce the data dimensions effectively and achieve the purpose of sparse learning.The algorithm can keep the total misclassification cost minimum and obtain better classification results.The core contents of the proposed algorithm are as follows:(1)A robust cost-sensitive feature selection algorithm via self-paced learning regularization(RCSFS_SP).The RCSFS_SP algorithm introduces cost-sensitive learning into the feature selection algorithm framework and embeds the regularization items of the self-paced learning framework.This framework can control the number of samples selected in the model training process by adjusting the self-paced learning step parameters to reduce the influence of outliers on the model.The ? adaptive loss function is used to constrain the feature selection model.By adjusting the value of ?,it avoids the influence of the l1-norm and l2-norm on the sensitivity of different data points.It increases the anti-interference ability and robustness of the model.Use the square l2,1-norm to constrain the weight matrix to achieve the purpose of sparse learning.It is verified by related experiments that RCSFS_SP performs better than other comparison algorithms on different data sets.(2)A cost-sensitive feature selection algorithm based on principal component analysis(PCSFS).The principal component analysis regular item is embedded in the feature selection algorithm.The principal component analysis regular item can ensure that the variance of the data sample of the feature selection is maximized,while maintaining the main information volume without introducing new information,which can effectively eliminate redundant information and reduce data dimensions.Orthogonal constraints are imposed on the model to ensure that the selected features are linearly independent.The l2,p-norm is used to constrain the feature weight matrix to achieve the purpose of sparse learning and feature selection and enhance the model's stability.Introduce cost-sensitive learning,considering the impact of cost on the model.Experimental data on related public data sets show that the PCSFS method has a significant performance improvement and is better than other related feature selection algorithms.This paper first improves the traditional feature selection algorithm that does not consider the cost of feature acquisition and misclassification in the process of feature selection.On this basis,the cost-sensitive learning theory is introduced to solve such problems.The self-paced learning framework and different norm regular items are used to ensure the stability,robustness,and generalization performance of the model and achieve the purpose of feature selection.Use the Support Vector Machine(SVM)classification method to classify the selected optimal feature subset samples and use multiple indicators to evaluate the experimental results.The final experimental data show that the algorithm proposed in this article is in the real two-class classification.Good results have been achieved on the data set.In future work,the research considers improving the existing algorithm into a multi-classification cost-sensitive feature selection algorithm and improve the algorithm to increase the scope of application.
Keywords/Search Tags:Feature selection, Cost-sensitive, Self-paced learning, Principal Component Analysis, Robustness
PDF Full Text Request
Related items