Font Size: a A A

Research On Integrated Learning Algorithm Based On Feature Extraction

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:B HanFull Text:PDF
GTID:2358330518968262Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The improvement of learning system generalization ability has always been the focus of machine learning field.The limitations and shortcomings of single classifier lead to the bottlenecks for classification performance improvement.An ensemble classifier as a new machine learning model,uses multiple classifiers to predict the same problem,and the classification results are determined by the learners comprised in it,and integrated according to certain rules.Ensemble learning can make the advantages of each classifier complementary advantages from others,greatly enhance the classification system of generalization and classification performance.Ensemble classifiers are widely used in biomedical,information science and other fields.As Internet technology penetrates into all areas of social life,the data to be processed becomes more complex.Among them,unbalanced data,high-dimensional data,noise data and other types of data are prevalent.The traditional ensemble learning method has better performance to standardize data,but has limited effect on complex data classification.Therefore,it is very important to integrate the data processing method in ensemble learning.Feature extraction is one of the most important methods in data analysis and processing.It is widely used in data dimensionality reduction and noise elimination.In this paper,on the basis of further research on the ensemble learning algorithm,we propose some improved ensemble learning methods based on the combination of feature extraction and other data processing algorithms,as narrated below:Imbalanced data usually lead to poor classification results for minority samples.In order to reduce the imbalance ratio of data sets,the SMOTE oversampling algorithm can be used to preprocess the data.In this paper,the independent component analysis(ICA)algorithm is used to eliminate the data noise,and the SMOTE algorithm is used to balance the data,which makes the processed data have better adaptability to the ensemble learning algorithm.The experimental results show that the proposed method can significantly improve the classification performance of the ensemble learning algorithm of Bagging for imbalanced data.Different types of data exist in a certain organization and structure of information,and their attributes are interrelated.Based on the analysis of the research,the attributes of web spam data sets are not only high dimensional but also high correlation degree among them.Based on the fact that content and link features of web spams have high dimension and their attributes have relevance,in view of the fact that content and link features of web spams have high dimension and their attributes have relevance,this paper analyzes the attribute with grouping principal components analysis rather than the whole principal component analysis on the basis of deeply studying the attributes of web spam,which reduces the dimension and protects the original attribute structure of data set.The experimental results show that the method proposed in this paper has good performance in the application of web spam classification.
Keywords/Search Tags:Ensemble learning, Feature extraction, PCA, ICA, SMOTE algorithm
PDF Full Text Request
Related items