Font Size: a A A

Several Performance Optimizations Of Covering Algorithm And Their Implementations

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2248330398479205Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a constructive model building method, CA(Covering Algorithm) is well known for its fast learning speed and excellent classification ability. Now is the era of Internet, we often meet the challenge of training and classification of large data-set, therefore promoting machine learning training speed and classification accuracy of the algorithm is of great significance.The main work of this dissertation is to promote the training speed and the classification accuracy of the CA(covering algorithm), and two improved methods are proposed, they are Parallel Covering Algorithm(Parallel CA) and OCA (Overlapped Covering Algorithm) respectively. Parallel CA decomposes the complex problem into a number of smaller independent problems. Thanks to high efficient parallel computing technique, those independent problems can be solved simultaneously. The proposition of OCA is inspired by the process of shape filling in computer graphics, and the principle is to extend the covering area around, the center of the new cover is decide by the samples that was covered before. OCA greatly promotes classification accuracy of the covering model.The work includes:1. The model creation of CA is analyzed, and the sample covering of each class can be seen as a two class covering problem, so a multi-class sample covering problem can be decomposed to a series of two class covering problems. And these sub-covering problems are mutual independent, because the cover model of a class only concerns this class. After decomposition of the problem, the sub-problems can be solved simultaneously by parallel computers. Three public datasets are used in the experiments, algorithm evaluation on a multi-core workstation indicates that the proposed Parallel CA improved the speed several times that of in batch mode. The algorithm itself does not depend on parallel computing environment, it’s also applicable to distributed clusters.2. The methodology of covering algorithm is based on that things of the same kind are flock together. An area of samples are represented by a center and a radius, so the whole samples are generalized by a set of covers. This dissertation present another view, sample is an abstraction of the real world, and its space distribution conforms to regular rules, so the covering model can be seen as an approximation of the sample space distribution. The covering model approximates to the samples more, and higher classification accuracy the model will have. By analyze whether same class covers overlap, find that when the same class covers are mutual isolated, the created model have very high classification accuracy to the samples that can be recognized, but unrecognized samples influenced the total result. When the same class covers allow to overlap each other, the unrecognized area’s decrement cause a great reduce to the unrecognized samples, but at the same time, the recognized samples’classification accuracy has a slight falls, nevertheless, due to the number of the unrecognized samples reduction, the total result is better than that of the previous.3. OCA is proposed based on the analysis of two kinds of covering methods. A queue is used to record the covered samples around, then take samples from the queue iteratively as a center to create new covers, and new covered samples are insert into the queue. In the covering process, the redundant covers are excluded from the cover set. Processing of the edged samples and the isolated samples further optimize the model. Three public dataset is used in the experiments to evaluate the performance of OCA, the result show that model created by OCA has less unrecognized area, and the classification accuracy has a significant improve on some part of the data sets.4. Finally, a software package which develops to do covering algorithm experiments is introduced, it is upload onto the internet for share. The package implements CA and OCA, provides training, testing, cross validation, random training-testing and some other function. The source code is opened, native C++ programs can either compiled in Windows or Linux operating system for researchers on different platforms.
Keywords/Search Tags:Classification, Covering algorithm, Parallel Covering Algorithm, Overlapped Covering Algorithm, Covering Software Toolkit
PDF Full Text Request
Related items