Font Size: a A A

The Research And Application Of Diverse AdaBoost Relevance Vector Machine In Distributed Environment

Posted on:2019-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:W C QinFull Text:PDF
GTID:2428330596466424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relevance Vector Machine(RVM)is a machine learning algorithm based on sparse Bayesian learning.RVM is good at dealing with small-scale data problems,and due to its excellent performance,it has been applied to different fields such as image processing and fault diagnosis.However,RVM has difficulites in large-scale data processing.For example,when training large-scale dataset,RVM consumes too much memory and time,and results in the decrease of learning efficiency.It also gets suboptimal solutions on noisy or unbalanced datasets.To solve these problems,the mixed sampling,noise detection,and AdaBoost method are employed in this thesis to improve the classification accuracy of RVM on small-scale but unbalanced and noisy datasets.In addition,the distributed computing,ensemble learning,and diversity measure method are combined with RVM to deal with both balanced and unbalanced large-scale datasets.The main work of this thesis includes:(1)A mixed sampling method based on random undersampling and ADASYN are adopted to reduce the impact of unbalanced and noisy samples in small-scale datasets for RVM classification.Then a noise detection method based on the distinct characteristic of probabilistic output for RVM is proposed and applied to AdaBoostRVM(NDAB-RVM)to achieve the noise decrease in AdaBoostRVM and obtain a combined boosted RVM model with good classication accuracy for unbalanced and noisy samples.(2)A distributed ensemble of RVM(DE-RVM)based on diversity measure is proposed to deal with large-scale datasets by using dividing and conquering strategies and ensemble learning methods.DE-RVM is implemented on Spark platform.Firstly,a new partitioning scheme is put forward to solve the problem of data imbalance caused by data partitioning.Then,the proposed NDAB-RVM algorithm is employed to train each small-scale dataset to construct the RVM classifier.Finally,these classifiers are combined into a final RVM's ensemble classifier according to a specific combining strategy with the smallest empirical error.The experimental results on real datasets and manual datasets show that DE-RVM can effectively improve the ability of RVM to process the large-scale datasets.(3)The DE-RVM algorithm is applied to the crack identification of bridges,and the crack damage identification model is established based on the data collected by the accelerometer.The feasibility of the algorithm is verified by the experiments.
Keywords/Search Tags:Relevance Vector Machine, Noise, Ensemble Learning, Diversity Measure, Large-Scale Data Sets
PDF Full Text Request
Related items