Font Size: a A A

Parallel Multi-label K-nearest Neighbor With Local Dependency

Posted on:2019-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:C P XiaFull Text:PDF
GTID:2428330590965753Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Each object is associated with a number of class labels In multi-label classification problem and the main task is to Identify all the tags that may be associated with the unseen sample.How to effectively deal with exponential label sets space as the number of class laebls increase is an essential and challenging task.However,most of the existing multi-label learning algorithm can not efficiently explore correlations between labels to promote the multi-label learning process.Multi-Label k Nearest Neighbor,which is derived from the classic algorithm k-nearest neighbor,is a lazy multi-label learning method.It overcomes the problem of class imbalance in multi-label learning and inherits the advantage of lazy learning,but it ignores the correlations between tags.The correlations of local label subset in predicting stage of ML-kNN is introduced to improve the validity of the model in the thesis.Moreover,combining the advantages of distributed computing and lazy learning,this paper proposes a parallel version of local multi-label k Nearest Neighbor with local dependency,which enables it adapt to large-scale multi-label data mining applications.1.In order to improve the effectiveness and generalization ability of ML-kNN,a method called multi-label k Nearest Neighbor with local dependency is proposed in this paper.First,in order to reduce the label space,the label subset with cooccurrence and mutual exclusion is selected according to the value of mutual information.Secondly,the influence of local label subset on the distribution of k nearest neighbor sets is considered when calculating the posterior probability for each label.Finally,the number of samples that satisfy the constraints is counted according to the similarity of the distribution of the label subset in the nearest neighbor sets.In order to verify the effectiveness,we compare the effects of 6 classical multi-label classification methods and 6 multi-label dataset from different fields.Experiments show that it can make full use of the correlations between labels to promote the learning process of the model.2.In order to further improve the adaptability to large-scale multi-label datasets,combining the characteristics of lazy learning and the technology of big data processing,this paper proposes a multi-label k Nearest Neighbor with local dependency algorithm under the framework of MapReduce.The whole MR job is completed together by thechain structure composed of each small task.It improves the adaptability and scalability of big data on the premise of guaranteeing the classification accuracy.3.In order to make full use the advantages of memory computing and further improve the performance,this paper proposes and compares the parallelization algorithm under the Spark,and analyzes the performance of algorithms under different distributed frameworks from multiple perspectives.
Keywords/Search Tags:Multi-Label Classification, ML-kNN, labelcorrelation, MapReduce, Spark
PDF Full Text Request
Related items