Font Size: a A A

Research On Multi-label Learning Algorithm Based On K Nearest Neighbor

Posted on:2015-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:2268330428968664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Instance classification has always been a hot research topic in the field of data mining. Traditional instance classification is single label classification, in the field of single label classification, each instance is associated with only one label. However, in our daily life, each instances is always associated with more than one label, how to process this kind of instance classification has been called multi label learning. Traditional classification method can’t be directly used to handle multi-label data due to its particularity, so different kinds of methods have been proposed to deal with multi-label classification. These methods can be divided into two categories in general: the method based on problem transform and the method based on algorithm transform. The method based on problem transform convert the multi-label data into single label data through some strategies, then use single label classification for processing. The method based on algorithm transform through improving the existing single label classification method, makes it can be directly used to handle multi-label data.This paper mainly studies the multi-label classification method based on K nearest neighbor. The main work content this paper has finished is as follow:1:Existing multi-label learning algorithm based on lazy learning:An improved multi-label learning algorithm based on lazy learning (IMLLA). When building the nearest neighbor set of each instance, the number of the neighbor for each instance is a constant value valued k. Choose the nearest neighbor of each instance in this way doesn’t fully consider the distribution of instances. Aimed at this disadvantage, in this paper, we join the idea of Granular Computing into IMLLA, then we proposed a new multi-label learning algorithm:A multi-label learning algorithm based on Granular Computing (GMLLA).GMLLA fully consider the distribution of instances when building the nearest neighbor set of each instance. The nearest neighbor set of each instance is build with the controlling of the granularity. Then the instances in the nearest neighbor set are with higher similarity. Experimental results show that the performance of our algorithm is superior to IMLLA in overall. 2:Existing multi-label learning algorithm based on random walk:A multi-label learning algorithm based on random walk (MLRW). When building the random walk graph, MLRW connects all instances with the same label, which result in a large amount of edges in the graph. The large amount of edges will result in a relatively complex random walk convergence procedure, which will result in a relatively high complexity of algorithm. In this paper, we combine the method K nearest neighbor in machine learning with random walk, give a new multi-label learning algorithm:A multi-label learning algorithm based on K nearest neighbor and random walk (ML-RWKNN). Firstly, we find the k nearest neighbor of each instance from the training set. We build the KNN graph based on training set. Secondly, we find the k nearest neighbor of each test instance from the training set and build the random walk graph based on KNN. The random walk procedure for each test instance is proposed on the random walk graph based on KNN. After the random walk convergence procedure, we will get a stable probability distribution vector. We can get the probably of the test instance have each label. Lastly, we give the corresponding threshold selection method. We can get the corresponding threshold vector through the threshold selection method. Then we can determine whether the test instance have each label through compare the probably of the test instance have each label with the threshold vector. Analysis shows that the time complexity of multi-label learning algorithm based on random walk can be effectively reduced by combining KNN with random walk.We give the work summary of this paper and the outlook of the research in the future at last.
Keywords/Search Tags:classification, multi-label, K nearest neighbor, granular computing, threshold selection
PDF Full Text Request
Related items