Font Size: a A A

The Algorithm And Application Research Of Relevance Vector Machine For Large-scale Datasets

Posted on:2018-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhongFull Text:PDF
GTID:2428330596954796Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Relevance Vector Machine(RVM)is a machine learning algorithm based on sparse Bayesian learning.Its main purpose is to fit the target data for classification and regression prediction.It performs well when running classification and regression tasks on small-scale datasets.However,RVM also has certain drawbacks,which restrict the practical applications,such as(1)requires much time for model to converge and slow training process,(2)requires too much time and resource on training large-scale datasets and(3)may not be suitable for imbalanced datasets.In order to solve these problems,this paper studies the shortcomings of RVM in solving large-scale datasets of different types(balanced,imbalanced,high dimension,low dimension)and proposes several hybrid RVM algorithms on Spark.These algorithms have achieved great performance on different types of datasets by integrating RVM with granular computation and ensemble learning.The main work of this paper can be summarized as follows:(1)For training on large-scale balanced datasets with few features,we propose Discrete-AdaBoost-RVM(DAB-RVM)which incorporates ensemble learning in RVM.This algorithm aims at solving the problem of requiring too much time and resource of RVM on training large-scale balanced datasets.To reduce the requirement of time and resource,the algorithm splits the large-scale dataset into many small blocks,and then trains these blocks on Spark platform.In addition,as some of these blocks may be imbalanced due to the random split of the data,the algorithm replicates the minority samples through SMOTE algorithm to solve this issue.This method can effectively reduce the time and memory consumption of RVM on large-scale training datasets.The performance has been verified on the UCI standard dataset and artificial datasets.(2)For training on large-scale balanced datasets with many features,we propose Gentle-AdaBoost-RVM(GAB-RVM)which adopts the basic framework of DAB-RVM by utilizing the Gentle AdaBoost algorithm.Meanwhile,GAB-RVM has also taken full advantage of the abundant samples of large-scale datasets and changed the way of obtaining weak classifiers.This algorithm can effectively reduce the time and memory consumption of RVM on large-scale datasets with many features.(3)For training on large-scale imbalanced datasets,RGB-RVM and KGB-RVM algorithms are proposed by integrating ensemble learning and granular computation with RVM.Both algorithms split the training data into several blocks.For each block,proper methods of obtaining information granularity are adopted.By using this method,the positive and negative information granularities are obtained.The training block tends to be balanced by controlling the method of obtaining information granularities.Based on the different requirements of training accuracy and training speed,this paper selects RVM and KMeans as the rule of information granularity extraction respectively which represents RGB-RVM and KGB-RVM.The former focuses on training accuracy,while the latter pays more attention on the training speed.(4)The proposed algorithms are applied to the fiber bragg grating bridge health monitoring system.By using the data of the sensors deployed on the bridge,the algorithms are used to identify the crack damage.The experimental results verify the validity of the algorithms in fracture damage identification.
Keywords/Search Tags:relevance vector machine, granular computation, ensemble learning, largescale dataset, imbalanced dataset
PDF Full Text Request
Related items