Research On Active Learning Method Based On Rough Set Theory

Posted on:2018-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhou

Full Text:PDF

GTID:2348330569986448

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Active learning is an important research direction in the field of machine learning.The existing active learning methods usually choose an uncertainty or representative sample for the expert marking,and then add to the labeled data set for classifier learning.But these methods cannot fully use the information of data distribution and collect on the issue of opposition remains to be improved.Rough set is a new method which provides a new way to deal with uncertainty.There are three main problems in the traditional active learning methods:(1)the importance of the sample is not comprehensive;(2)the use of the samples' spatial structure information is not comprehensive;(3)the processing of the outliers is not perfect.Rough set theory is a new method which can acquire knowledge independently according to the data distribution to deal with uncertainty.In this thesis,the active learning method is studied based on Rough set theory,the following innovations have been achieved:(1)An active learning method based on rough sets is proposed.First of all,combined with the neighborhood rough set reduction of unlabeled samples for neighborhood,this thesis proposes an algorithm for dealing outliers with their neighborhood and can be used to preprocess the sample set effectively.Secondly,the neighborhood rough set theory and active learning are combined to calculate the uncertainty and generalization of unlabeled samples,which can measure the importance of samples.This method can select the more important samples in the iterative process of active learning,which can improve the performance of the model.On the basis of this method,an active learning method based on rough set is proposed that can be used for unlabeled samples.(2)This thesis proposes an active learning sample reduction algorithm based on Spark platform,which is applied to large scale sample reduction.Active learning based on neighbor rough set's serial computational has low efficiency.This thesis uses Spark memory iterative computing technology,proposing active learning sample reduction method based on Spark.Active learning applied to large-scale labeled samples reduction can filter noise data effectively and have same performance both on all samples and little samples.The actual complexity of the improved algorithm is greatly reduced,and the efficiency of the original algorithm is improved.For the active learning method based on rough sets,the experimental results show that the UCI data set,the proposed algorithm combined with the sample uncertainty calculation,representative calculation and outliers' selection makes full use of the information of data's distribution,which is an effective solution for sample selection of active learning.The algorithm in Accuracy and other indicators of learning algorithm is better than the literatures' active learning method.In addition,the experimental results show that the parallel algorithm can not only maintain the validity of the algorithm,but also greatly reduce the execution time of the algorithm.

Keywords/Search Tags:

neighborhood rough set, active learning, sample selection, sample reduction, Spark

PDF Full Text Request

Related items

1	Study On The Sample Selection Based On Rough Sets
2	Study On Key Technologies Of Active Learning In Division Classification Model
3	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
4	Mixed Data Mining Methods Based On Rough Sets Theory
5	Research On Quick Attribute Reduction From Local Perspective
6	The Study And Improvements Of Uncertainty-based Sample Selection
7	Active Sample Selection Algorithm And Its Application In Face Detection
8	Research On Active Learning Method Based On Density Clustering And Its Application
9	Research On Rough Set Method Based On Sample Correlation
10	Research On Big Data Sample Selection Based On MapReduce/Spark