Font Size: a A A

Optimization And Parallelization Of Multi-label Cluster Tree Classification Method

Posted on:2014-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2268330392469042Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, research on multi-label classification algorithms has developedrapidly. Multi-label classification algorithms are important technology to solve theMLC and LR tasks. In order to solve the multi-label tasks effectively, a newmulti-label classification algorithm based on cluster tree has been proposed in2011,named as Multi-Label Cluster Tree (MLCT), which has been proved to precede theother “state of art” algorithms by the good experiment results. However, MLCT hasdrawback. Meanwhile, with the development of the Internet, the data scale isexpanding, how to speed the algorithm with big data is also the problem we concern.This paper researches on the problems of MLCT and how to parallelize thealgorithm, the specific contents and results are as follows:(1) Comprehensive research and analyze the existing multi-label algorithms,summarize the advantage and disadvantage of each algorithm. Summarize themachine learning algorithms based on MapReduce framework.(2) Introduce the multi-label cluster tree algorithm in-depth, analyze theMLCT’s theoretical and processes, point out the problem of MLCT.(3) Proposed two optimization strategies to improve the accuracy. The firstmethod is to create binary relevance classifiers in each node for the labels. The othermethod is to represent the correlation via Pearson Correlation Coefficient. Thesetwo mathods make use of relationship between labels to improve the precision.(4) Proposed and implement the distributed MLCT algorithm under theframework of the Hadoop MapReduce system, with emphasis in the training processand classification forecasting process.(5) Experimental analysis proved the evaluation of two optimization strategiesis outperformed the original MLCT’s. At the same time, the experiments alsoshowed that, parallelization of MLCT based on the MapReduce framework achievegood performance.
Keywords/Search Tags:Multi-Label Classification, Cluster Tree, Parallelization Classification
PDF Full Text Request
Related items