The Research And Application Of Diverse AdaBoost Relevance Vector Machine In Distributed Environment

Posted on:2019-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:W C Qin

Full Text:PDF

GTID:2428330596466424

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Relevance Vector Machine(RVM)is a machine learning algorithm based on sparse Bayesian learning.RVM is good at dealing with small-scale data problems,and due to its excellent performance,it has been applied to different fields such as image processing and fault diagnosis.However,RVM has difficulites in large-scale data processing.For example,when training large-scale dataset,RVM consumes too much memory and time,and results in the decrease of learning efficiency.It also gets suboptimal solutions on noisy or unbalanced datasets.To solve these problems,the mixed sampling,noise detection,and AdaBoost method are employed in this thesis to improve the classification accuracy of RVM on small-scale but unbalanced and noisy datasets.In addition,the distributed computing,ensemble learning,and diversity measure method are combined with RVM to deal with both balanced and unbalanced large-scale datasets.The main work of this thesis includes:(1)A mixed sampling method based on random undersampling and ADASYN are adopted to reduce the impact of unbalanced and noisy samples in small-scale datasets for RVM classification.Then a noise detection method based on the distinct characteristic of probabilistic output for RVM is proposed and applied to AdaBoostRVM(NDAB-RVM)to achieve the noise decrease in AdaBoostRVM and obtain a combined boosted RVM model with good classication accuracy for unbalanced and noisy samples.(2)A distributed ensemble of RVM(DE-RVM)based on diversity measure is proposed to deal with large-scale datasets by using dividing and conquering strategies and ensemble learning methods.DE-RVM is implemented on Spark platform.Firstly,a new partitioning scheme is put forward to solve the problem of data imbalance caused by data partitioning.Then,the proposed NDAB-RVM algorithm is employed to train each small-scale dataset to construct the RVM classifier.Finally,these classifiers are combined into a final RVM's ensemble classifier according to a specific combining strategy with the smallest empirical error.The experimental results on real datasets and manual datasets show that DE-RVM can effectively improve the ability of RVM to process the large-scale datasets.(3)The DE-RVM algorithm is applied to the crack identification of bridges,and the crack damage identification model is established based on the data collected by the accelerometer.The feasibility of the algorithm is verified by the experiments.

Keywords/Search Tags:

Relevance Vector Machine, Noise, Ensemble Learning, Diversity Measure, Large-Scale Data Sets

PDF Full Text Request

Related items

1	The Algorithm And Application Research Of Relevance Vector Machine For Large-scale Datasets
2	Ensemble Of Oselm For Large Data Sets Classification
3	Research On Ensemble Learning
4	Researches On Support Vector Machine Learning Approaches Based On Ensemble Learning
5	The Classification Of Imbalanced Large Data Sets Based On Map Reduce
6	Research On Binary Imbalanced Large Data Classification And Its Application
7	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
8	The Study On Support Vector Machine Ensemble Learning
9	Optimization Method Research And Application Of Multiple Classifiers Ensemble Based On Diversity Measure
10	Research On Support Vector Machine For Large Scale Imbalanced Data