Font Size: a A A

Ensemble Selection Of Decision Trees And Applications In Unbalanced Text Classification

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:K YuFull Text:PDF
GTID:2518306488966659Subject:Engineering
Abstract/Summary:PDF Full Text Request
Classification is an important task of machine learning and data mining,and the decision tree is a commonly used classification model for classification.A decision tree is a tree structure with instability.In order to improve the stability of decision trees,ensemble learning combines multiple trees into an integration by voting or other methods.The prediction of an ensemble is more stable and accurate and the generalization ability of an ensemble is better than that of a single tree.However,there are redundant members in the ensemble,which may decrease the prediction accuracy of the ensemble.Ensemble selection aims at removing redundancy and improving the performance of ensemble with less calculation costs.Since both accuracy and diversity are important in an ensemble of decision trees,a selected sub-ensemble composed of accurate and diverse decision trees can improve the accuracy of classification for unseen samples.Text classification is a fundamental task in natural language processing and text information mining.Decision trees are easy to be interpreted,thus widely used in text classification.This thesis focuses on the ensemble selection of decision trees and its application in unbalanced text classification.Firstly,we propose a diversity measurement method that considers both the semantics and structures of decision trees.Secondly,we analyze the KFHE(Kalman Filter Heuristic Ensemble)which is proposed recently for the decision tree ensemble,and then select a sub-ensemble for KFHE to improve the performance.Finally,we use the selected ensemble to classify the unbalanced text.The main research contains the following aspects:(1)A method named weighted Jaccard distance(WJD)is proposed for the decision tree ensemble selection,which can measure the diversity of decision trees based on the structures of the decision trees and the predictions on the validation set.We analyze the properties of WJD,and then employ WJD-based hierarchical clustering to select decision trees for an ensemble.The experimental results conducted on UCI datasets demonstrate that WJD is an effective diversity measure and the selected sub-ensemble based on WJD can obtain better classification accuracy.(2)KFHE is a recently proposed method for decision tree ensemble.We give the properties of KFHE by three theorems with proof.The theorems indicate that redundant members are included in the KFHE.This motivates us to propose an Order-based Kalman Filter Selective Ensemble(OKFSE)to select a sub-ensemble.The experimental results show that OKFSE has better prediction performance and robustness on datasets with noise.(3)This thesis studies the application of decision tree ensemble selection on imbalanced text classification,and uses various decision tree ensembles and pruning methods to conduct experiments on unbalanced text datasets.Finally,we analyze and discuss the experimental results.
Keywords/Search Tags:decision tree, ensemble, ensemble selection, text classification
PDF Full Text Request
Related items