Font Size: a A A

Research On Key Technologies Of Multi-label Classification

Posted on:2017-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:C R YueFull Text:PDF
GTID:2348330488485687Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information age, all kinds of data have shown a rapid growth trend. Currently, Faster and more accurate classification for data is the focus of research in the field of data mining and machine learning. According to the number of labels, data can be divided into single-label data and multi-label data. Due to the relevance between features and labels each other, multi-label data classification is more complex than the single-label data. So the research on the problem of multi-label classification has become a new research hotspot. In this paper, feature selection and classification algorithm are studied mainly in multi-label classification.In view of the sparsity and high dimensionality of multi-label data, this paper proposes a new feature selection method MA-MLFS based on memetic algorithm to reduce the dimension of multi label data. The local search strategy in memetic algorithm is mainly studied. In the local search procedure, each iteration chooses the best fitness chromosome. According to the strength of the correlation between feature and label sets select corresponding feature on the chromosome of local "ADD" and "DEL" operation. In the newly generated chromosomes select the best fitness and superior individual than the original chromosome to replace the original chromosome substitution, so as to find out the local optimal solution, and optimized population. The proposed method avoids the problem that the genetic algorithm is easy to fall into the local optimal solution effectively.Due to each labels has unique characteristics, this paper designs an LC-KNN multi-label classification algorithm based on characteristics label characteristics in the multi-label data. Firstly, K-means clustering algorithm is used to cluster the positive and negative samples of each label on the training set, find out the same number of clustering center. The problem of predicting whether a sample contains a label is translated into two classification problem. The positive and negative clustering center of the label is used as the training set. Then KNN classification algorithm is used to classify the samples. Weighted processing is carried out to calculate the distance between the predicted sample and the positive samples so that the distance distribution between the predicted sample and the positive and negative samples is more obvious. Finally, the classification results of each label are combined to get the label set of the predicted samples. The algorithm not only makes full use of the characteristics of the labels, and effectively avoids the impact of label distribution imbalance on the classification results.The MA-MLFS method and the other two kinds of feature selection methods, GA and FSIG are used to select the features of the multi-label data sets. And ML-KNN multi-label classification algorithm is used to classify. Through the classification effect comparison found that the classification average precision using the MA-MLFS feature selection on different data sets higher 2%?5% than the other two methods, which verifies the MA-MLFS method is effective. Then the LC-KNN algorithm is used to classify different multi-label data set by MA-MLFS feature selection. Compared the classification results with ML-KNN algorithm shows that the average classification accuracy of LC-KNN is increased by about 2%, which shows that the algorithm is feasible and effective.
Keywords/Search Tags:multi-label classification, feature selection method, classification algorithm, memetic algorithm, label characteristics
PDF Full Text Request
Related items