Font Size: a A A

Research On Semi-supervised Multi-label Classification Algorithm Based On Degree Of Association

Posted on:2020-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q J SongFull Text:PDF
GTID:2438330596497554Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of large data,multi-label classification of massive high-dimensional data has become a new research hotspot.Most of the existing multi-label classification algorithms assume that the data are distributed independently,without considering the relationship between the data.However,there must be more or less explicit or implicit relationship between data in real life.And labeled data acquisition is expensive,which brings new challenges to multi-label classification algorithm.To solve the above problems,this paper adopts a semi-supervised multi-label classification method based on correlation degree.The basic idea is to reduce the dimension of multi-label data and reduce data redundancy at first,and then assign labels to unlabeled data by semi-supervised label transfer,get the external correlation degree between data attributes and labels,calculate the internal correlation degree between labels by Kulc correlation degree,and finally construct classification model and its algorithm based on common data sets.The experimental results show that the proposed method is reasonable and feasible,and improves the performance of multi-label classification.Specific research contents are as follows:1.The concept of multi-label classification and some classical multi-label algorithms are reviewed.The main ideas of these multi-label methods and their advantages and disadvantages are summarized.2.In the large data environment,there is dimensionality disaster and a lot of redundancy in data.In view of this situation,this paper introduces principal component analysis and linear discriminant analysis to reduce the dimensionality of data and reduce data redundancy.Experiments show that both methods can effectively improve the performance of multi-label classification,and LDA dimensionality reduction method can use label information.The projected data in low-dimensional space are densely distributed and sparsely distributed among classes,which is more conducive to the subsequent multi-label classification.3.Aiming at the relationship between labels,Kulc correlation degree is used to calculate the relationship between labels.Introducing a large number of unlabeled data,using semi-supervised learning idea,we construct a connected graph,and construct asoft label matrix by label transfer.Then we can get the degree of external association between data attributes and labels.At the same time,we can give labels to a large number of unlabeled data to further improve the accuracy of the degree of Association within labels.The internal and external correlation degree is integrated into the multi-label classification algorithm to form an algorithm(RSML)which considers not only the relationship between data and labels,but also the relationship between labels.4.Through the comparative analysis of experiments,we compare the two dimensionality reduction methods to improve the performance of the algorithm RSML,and get the optimal algorithm LDRSML.Then we compare the algorithm LDRSML with the common multi-label classification algorithm on the commonly used data sets.The comparative experimental results show that the algorithm LDRSML improves the classification performance.
Keywords/Search Tags:multi-label, k-nearest neighbor, correlation degree, semi-supervised
PDF Full Text Request
Related items