Font Size: a A A

Parallel Multi-label Evolutionary Hyper-network On Spark

Posted on:2018-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2348330569986444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the multi-label learning gets more attentions from many fields,such as image recognition and text classification,and has increasingly important potential application value in the real world.In multi-label learning,each object is associated with a set of class labels.So the key challenge of multi-label learning is the exponentian prediction label space,existing multi-label learning approaches mainly focus on the improvement of learning processes by using label correlations.Nevertheless,an intrinsic characteristic of learning from multi-label data,i.e.class-imbalance among labels,has not been well investigated.Besides,most multi-label learning algorithms do not work very well in dealing with large-scale data sets.In the multi-label evolution hyper-network,hyper-edge and its corresponding weight represent high-order relationships between feature subsets and multiple class labels,which could be effectively used as mining of label correlations.In this thesis,based on the multi-label evolution hyper-network the improved algorithm is proposed,which deals with label correlations and class-imbalance using Spark's distributed parallel computing framework for large-scale data processing.The main research work of the thesis is shown as follows:1.In order to deal with label correlations and class-imbalance,this thesis proposes a modified multi-label evolution hyper-network based on Spark.Firstly,the model converts the traditional hyper-network into a multi-label hyper-network.Secondly,cost-sensitive strategy is introduced into the multi-label evolution hyper-network for addressing the problem of class-imbalance.Meanwhile,the replacement of hyper-edges and the gradient evolution learning process is optimized to reduce the time complexity and improve the performance.Finally,we improve the adaptability of the algorithm to large-scale data sets by implementing parallel computing framework under Spark platform.2.In order to further improve the performance of the proposed algorithm on large-scale data sets,an improved multi-label evolutionary hyper-network ensemble algorithm based on Spark is proposed,which combine hyper-network structure and ensemble learning.Firstly,we construct a training cluster with similar feature spaces using Self-Organizing Map.Secondly,with respect to each training cluster,we use theproposed improved multi-label evolutionary hyper-network algorithm based on Spark to form a number of local multi-label hyper-networks.Finally,the local hyper-networks are transformed to a new hyper-network using selective ensemble learning method for predicting the testing samples.In this thesis,comprehensive experiments are conducted to verify the effectiveness and superiority of the proposed algorithm on 12 multi-label datasets.On the one hand,the effectiveness of the proposed algorithm is verified by comparing the prediction performance between the proposed algorithm and the state-of-the-arts algorithms,such as Co-MLHN.On the other hand,by analyzing the efficiency,the proposed algorithm has lower time complexity,good parallelism and scalability.
Keywords/Search Tags:multi-label learning, evolution hyper-network, label correlations, Spark, ensemble
PDF Full Text Request
Related items