Font Size: a A A

The Parallel Algorithm Of The Neighbor Distance Weighted Partial Label Learning Based On MPI

Posted on:2019-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y GaoFull Text:PDF
GTID:2428330590465728Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,many application fields cannot obtain the exact correspondence between samples and labels due to the limited technical and human resources,For example,calligraphy classification,the correspondence of news characters and their name.In view of this situation,the partial label learning has appeared and gradually attracted the attention of scholars.The partial label learning is one of the important frameworks of the weak supervision learning,which mainly deal with the problem of how to classify the unknown samples when the relationship of the training sample and label is not clear,only knowing the candidate set of training sample.At present,the partial label learning has been successfully applied in many fields such as ecological informatics,image classification and webpage mining,which has become a hot topic in the research of many scholars.With the rapid development of science and technology,the Internet age has come,for example,the number of the social software's users,such as QQ and wechat,has hundreds of millions,at the same time,the number of the famous websites' users,such as taobao and jingdong,has reached a staggering numbers.So millions of data are generated at least every day.In order to make full use of these mass data and dig out the useful information,the research of the partial label learning not only need to consider the algorithm performance,also need to focus on operational efficiency in the classification.However,at present,most of the methods of the partial label learning are more computationally intensive,which is not suitable for large-scale data.To solve the above problems,this paper improved the original method based on the sample,and proposes a parallel model for the improved algorithm.The main research contents of this paper are as follows:1.To reflect the problem which IPAL(Instance-based Partial Label learning)spends too much time on calculating the neighbors of each sample and the weight of the neighbors and can't apply to Large-scale data,a new partial label learning method based on the neighbor distance weighed was proposed that improved the way of calculating neighbor weight.The model uses the distance of its neighbor samples to directly calculate the weight of the nearest neighbor sample instead of the original algorithm which obtains the nearest neighbor sample weight by deal with the constrained least squares problem.In theory,the validity of this algorithm is proved by analyzing the time complexity of the original algorithm and the improved algorithm.In the experiment,the comparison experiment is done by four UCI data sets and five real data sets,and the results show that the proposed algorithm,under condition of which classification performance is similar with the original algorithm,improve the operation efficiency of the algorithm.2.To further enhance the partial label learning algorithm based on the neighbor distance weighed running efficiency,the parallel method of the partial label learning algorithm based on the neighbor distance weighed was designed.which shorten the operation time of the algorithm and is implemented on the MPI cluster environment,by the way that the data is divided on average and assigned to multiple processes,and then processes are mutual communicating and cooperative to complete the algorithm performance.This paper prove firstly the rationality and feasibility of the parallel model from the aspects of time complexity,and then use four large-scale data sets to designed contrast experiment,obtained the conclusion which classification accuracy of serial and parallel model is the same,but parallel model greatly improve the speed and can deal with large-scale data.
Keywords/Search Tags:the partial label learning, parallel, large-scale data, MPI, the operating efficiency
PDF Full Text Request
Related items