Font Size: a A A

Research On Multi-label Classification Algorithm Based On KNN

Posted on:2018-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2348330515955906Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Classification is such a course that attributes the given data to predefined data types,which is an important branch in data mining and machine learning as well as has extensive applications and researches.Classification can be divided into single label classification and multi-label classification according to whether data belongs to one single class or multiple classes.For the reason that multi-label can express the existences more precisely,so multi-label classification is used more widely in contrast to single label classification.At present,multi-label classification is comparatively more used in the domains of text categorization,bioinformatics classification,scene classification,automated audio annotation,video clip and so forth.Though its extensive applications,for the inherent complication of multi-label classification,i.e.label correlation and complication of multi-label data expression,exponent output space in addition,it is urgent to have a further research and integration for existed theories and algorithm to promote the performance of multi-label classification algorithms.1 ? Analysis of relevant theories.Firstly,the relevant theories of single label are introduced briefly;then the relevant theories and methods of multi-label classification are described and analyzed concisely?2?Improved algorithm based on MLKNN(Multi-Label K-Nearest Neighbor).KNN(K-Nearest Neighbor)algorithm is a sort of simple and effective clustering algorithm and has some extent application to multi-label classification.On the basis of analyzing the disadvantages of MLKNN algorithm,this paper proposes a kind of improved MLKNN algorithm.For every input data,the algorithm firstly uses KNN to get the k-Nearest Neighbor and gets the prior probability and posterior probability of every label of this data;then gets the biggest probability of every label.Every label probability of each data is put into the character vector of corresponding data to express the local label correlation,then uses the new data set to train classification model.The contrast experiments show that the proposed method has better classification performance.3?Classification algorithm of multiple instances expression of data.Many existed classification algorithms express data with one instance to train classification and then use the trained model to get the label set of unlabelled data without exploiting rich information included in data.In view of this above,this paper proposes a multi-label classification based on multiple instances expression of data.For each input data,this algorithm primarily uses KNN method to get its k-Nearest Neighbor,and for every label,corresponding prototype vector is achieved using arithmetic mean of all data in the k-Nearest Neighbor.The differences of input data with every corresponding prototype vector are treated as the instances of relevant label,and then every data is expressed with multiple instances as the new train set.At the end,the new train set is used to train classification model.The experiments denote the effectiveness of proposed algorithm in this paper.
Keywords/Search Tags:Classification, Multi-Label Classification, KNN, Label Correlation, Multi-Instance
PDF Full Text Request
Related items