Font Size: a A A

Novelty Detection Based On The Ensemble Probability Information

Posted on:2019-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:X S QiaoFull Text:PDF
GTID:2428330575473665Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
One of the basic assumptions in most supervised machine learning algorithms is that the class label set is predefined and shared by the training and testing sets so that the classification model could have a good generalization capability.However,in open-domain applications,there always exist some data instances which are different from the normal data(the data in training sets)in terms of data distribution,and which do not belong to any one class in pre-defined class label set in terms of class characteristics.We call these data instances "novelties".We often ignore the fact that these are novelties and simply classify them with pre-defined class labels.However,novelties may contain special meaning and may even contain more valuable information than normal data.How to detect these unexpected novelties,discover knowledge in the presence of novelties,and make decision in the presence of novelties are active research topics in data mining.There are limitations with traditional novelty detection methods.To address some of the limitations,this dissertation presents two novel methods for novelty detection using probability information based on ensemble learning,which are different from traditional novelty detection approaches.These two methods are summarized as follows:(1)This dissertation firstly presents an efficient ensemble learning based method for novelty detection,called Ensemble mean Probability Novelty Value Detection(EPVND).This method provides a metric to characterise different classes and use the metric as the basis to test for novelties.First of all,an ensemble system is constructed by building n different types of individual classification models which can output class probability vectors from training dataset.Then these class probability vectors are used to obtain a confidence threshold for each class.A new test sample will be regarded as a novelty if its confidence of belonging to any known class is less than the confidence threshold of the corresponding class.The experimental results on UCI datasets,digital handwritten dataset and face datasets show the effectiveness of the EPVND.(2)Extending EPVND,this dissertation presents another effective ensemble learning based method for novelty detection-Ensemble mean Probability distribution for Novelty Detecion(EPDND).An ensemble system is firstly constructed by building n different types of individual classification models which can output class probability vectors from training dataset.Then,a set of class probability vector are obtained by inputting the training data into the ensemble system.These class probability are used to calculate confidence distribution for each class,which is called Class Mean Confidence Distribution or Class Representation Point.In other words,we use a vector to characterise a specific class in EPDND whereas we use a single value to characterise a specific class in EPVND.A vector clearly contains more information than a single value.Our approach utilizes the distance between the sample single confidence distribution and the class mean confidence distribution to measure the proximity of a sample and a class.This distance can be Euclidean Distance,Manhattan Distance or the Cosine Similarity and so on.The distance threshold needs to be preset.The Euclidean Distance is applied in our work.When the distance value exceeds the threshold,meaning the sample is not similar with this class,the sample is rejected by this class.The sample will be regarded as novelty when it is rejected by all known classes.Experimental results show that EPDND can be used to detect novelties effectively.
Keywords/Search Tags:Machine learning, novelties, novelty detection, ensemble learning, class probability, confidence threshold, confidence distribution
PDF Full Text Request
Related items