Parallel Multi-label K-nearest Neighbor With Local Dependency

Posted on:2019-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:C P Xia

Full Text:PDF

GTID:2428330590965753

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Each object is associated with a number of class labels In multi-label classification problem and the main task is to Identify all the tags that may be associated with the unseen sample.How to effectively deal with exponential label sets space as the number of class laebls increase is an essential and challenging task.However,most of the existing multi-label learning algorithm can not efficiently explore correlations between labels to promote the multi-label learning process.Multi-Label k Nearest Neighbor,which is derived from the classic algorithm k-nearest neighbor,is a lazy multi-label learning method.It overcomes the problem of class imbalance in multi-label learning and inherits the advantage of lazy learning,but it ignores the correlations between tags.The correlations of local label subset in predicting stage of ML-kNN is introduced to improve the validity of the model in the thesis.Moreover,combining the advantages of distributed computing and lazy learning,this paper proposes a parallel version of local multi-label k Nearest Neighbor with local dependency,which enables it adapt to large-scale multi-label data mining applications.1.In order to improve the effectiveness and generalization ability of ML-kNN,a method called multi-label k Nearest Neighbor with local dependency is proposed in this paper.First,in order to reduce the label space,the label subset with cooccurrence and mutual exclusion is selected according to the value of mutual information.Secondly,the influence of local label subset on the distribution of k nearest neighbor sets is considered when calculating the posterior probability for each label.Finally,the number of samples that satisfy the constraints is counted according to the similarity of the distribution of the label subset in the nearest neighbor sets.In order to verify the effectiveness,we compare the effects of 6 classical multi-label classification methods and 6 multi-label dataset from different fields.Experiments show that it can make full use of the correlations between labels to promote the learning process of the model.2.In order to further improve the adaptability to large-scale multi-label datasets,combining the characteristics of lazy learning and the technology of big data processing,this paper proposes a multi-label k Nearest Neighbor with local dependency algorithm under the framework of MapReduce.The whole MR job is completed together by thechain structure composed of each small task.It improves the adaptability and scalability of big data on the premise of guaranteeing the classification accuracy.3.In order to make full use the advantages of memory computing and further improve the performance,this paper proposes and compares the parallelization algorithm under the Spark,and analyzes the performance of algorithms under different distributed frameworks from multiple perspectives.

Keywords/Search Tags:

Multi-Label Classification, ML-kNN, labelcorrelation, MapReduce, Spark

PDF Full Text Request

Related items

1	Research Of Calibrated Label Ranking Multi-label Algorithm Based On Spark
2	An Improved Multi-Label Classifier Chain Algorithm Via Label Space Correlation
3	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
4	Research On Multi-label Classification Algorithm Based On Label Relationship
5	Multi-label Prediction Model Based On Ontology Database And Data Mining In Bio-medicine
6	Parallel Multi-label Classifier Chains Algorithm Using Apache Spark
7	Parallel Multi-label Evolutionary Hyper-network On Spark
8	Research And Application Of Multi-label Classification Methods
9	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
10	Research On Label Coding Algorithms For Multi-label Classification