Font Size: a A A

Research On The Improvement Of Association Classification Algorithm And Feature Selection Of Multi-label Classification

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2428330629480603Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification is an important research direction in machine learning.It constructs a classification model by learning data,which will be used to predict the classes of new instance.The traditional classification problem is single label classification.The classification algorithm assigns only one label to each sample as the class label.According to the characteristics of data sets,classification problems can be divided into balanced data classification problems and imbalanced data classification problems.Accuracy is an important measure to evaluate the classification performance of the algorithm.The higher the classification accuracy,the better the classification performance of the algorithm.Associative classification algorithm is a classical classification algorithm,which has the advantages of many rules and high classification accuracy.However,Associative classification algorithm generate many rules,but only a few of them are high-quality rules.especially for imbalanced data sets.Associative classification algorithm can not extract high-quality minority classes rules effectively,and associative classification algorithm can not have good overall accuracy and minority classes classification performance at the same time.In addition,many data sets have more than one label.This kind of data sets is called multi-label data sets.The classification of multi-label data sets is to select all the related labels for each sample as much as possible.However,multi-label data sets have a large amount of data and too many dimensions,and it is difficult to learn directly and effectively.So we need to reduce the dimensions of data sets.Some traditional feature selection algorithms do not retain enough relevance features for some labels,It is difficult for classification algorithm to effectively learn the data set after feature selection.In this paper,we do the following three research on the algorithmic level for the above problems.First,traditional associative classification algorithm generates more redundant rules,and the algorithm generates fewer high-quality rules.This reason can easily lead to new instances being misjudged.In order to solve this problem,we propose an improved associative classification algorithm(IAMC)based on multiple learning and correlation degree.When the IAMC algorithm extracts the rules,it uses the new measure associative degree to extract rules,and the algorithm learns many times from the random sample of training set,which extracts a large number of rules and effectively improves the quality of the rules.After the associativeclassification rules are extracted,the decision tree method is used to extract the rules again for the training instances which cannot judge the classes by using the existing rules,and then add the new rules to the rule sets.The experimental results show that the IAMC algorithm can effectively improve the classification accuracy on multiple UCI data sets.Second,traditional associative classification algorithms are difficult to extract minority classes rules effectively,and it is difficult to take into account both the overall accuracy and minority classes classification performance at the same time,In order to solve this problem we propose an improved associative classification algorithm based on class support threshold(ACCS).The algorithm sets the class support threshold for each class according to the relationship between the number and size of each class in the training set.The algorithm uses the class support threshold of this class to extract the association rules of this class independently.In order to improve the priority of minority classes rules,The algorithm uses two measures to sort rules.The experimental results show that the ACCS algorithm not only has a higher overall classification accuracy,but also effectively improves the classification performance of minority classes for imbalanced data sets.Third,traditional multi-label feature selection algorithm don't consider the relevance of labels,so the feature subset generated by the feature selection algorithm is unreasonable.In order to solve this problem,we propose a Multi-label featue selection algorithm based on Label Relevance(MILR).The algorithm uses mutual information as the measure,it selects the labels which has high correlation with other labels as important labels,then it calculates the correlation between each label and each feature.For each important label,the features are sorted according to the relevance between the label and the feature.Then the algorithm selects the relevance label.For unimportant labels,The algorithm selects features related to more than half of unimportant labels simultaneously.Finally all the selected features are combined,and the redundant features are removed again to form a feature subset.The experimental results show that the algorithm not only can effectively select the important features,reduce the dimensions of the original data set,but also has a high classification accuracy.
Keywords/Search Tags:Data mining, multi-label classification, Associative classification, feature selection
PDF Full Text Request
Related items