Font Size: a A A

Visual Data Classification Based On Random Forests

Posted on:2017-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:1318330536952932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of economy and society,electronic products and computer technology has been popularized in people's daily life and a large number of videos and images are produced every day.The research of visual data such as video and image has become one of the most focus problems in the fields of computer vision,pattern recognition,machine learning and so on.As the visual data becomes more and more complex and then forms big visual data,consequently,the traditional single statistical model is no longer able to analyze,understand and classify the big visual data very well.In recent years,machine learning has gradually become the main method and tool for computer vision,pattern recognition,digital signal processin,automatic control and artificial intelligence to mine and statistically analyse the big visual data.Random forest is one of the important branches and directions in the field of ensemble learning and it is an efficient method for computer vision understanding.The random forest can be used for classification and regression application.The main idea of the random forests is constructing an integrated system by combining many weak classifiers(predictors)trained.When a new instance arrives,the weak classifiers(predictors)classify(predict)it respectively,then the ensemble learning combines their computed voting(average value)results as the output for this instance.Random forest has gained many great achievements and exhibits excellent ability in data mining,pattern recognition,machine vision and artificial intelligence.Although many achievements have been made in actual application of random forests,it has not been completely explained and fully researched in feature selection,visual data distribution and basical model design schema in random forest has not been completely explained and fully researched.In this dissertation,it researches on visual data classification based on random forest,and the focus of this work is to take random forest as the classifier for visual data classification.The main innovation and contributions of this dissertation are:(1)For the feature selection improvement strategy in random forest,this dissertation explored the influence of feature selection when it as classifier for visual data classification task.In this dissertation,the block-based feature selection random forest is proposed.In the block-based feature selection random forest,firstly,all the features of visual data are divided into blocks according to a predetermined rule.Then,all features within a given block and random features from the rest are employed for determining every node split in the decision trees.After all basic models(decision trees)are built,they will vote class label for a new test sample.The proposed method achieved competitive classification results on the UIUC,UMD,KTH-TIPS,ALOT and FMD databases with the Gray Level Co-occurrence Matrix(GLCM),Local Binary Pattern(LBP)and Multi-Fractal Spectrum(MFS)features.(2)Metric learning of sub_database is one key problem of random forest.This dissertation enhanced the random forest as classifier for visual data classification task from the distributions of sub_databases.It exploited the visual data distribution for policy decision when building the decision trees in the forest.Gaussian mixture models are employed for studying the distribution of the original databaes and every Bootstrapped sub_database.For each database,a vector is gained from the Gaussian mixture model strategy,and the vectors have same dimensionality.The similarity of vector from each databaset with that from the original database can be achieved by given distance metric learning method.The decision trees have smaller weight when voting for classification if they are trained from the sub_database with smaller or larger similarity.The proposed metric forest algorithm achieved good visual data classification results from the experiments on the ALOT database,Flower102 database,Scene-15 database and Food101 database.(3)The depth of tree in the decision forest is one key problem for visual data classification.This dissertation presented a new boosted learning algorithm which empolys the random depth decision tree to build random forests as basic model in AdaBoost.Our algorithm is an enhanced development of deep Boosting by the depth of decision tree in the forest differ from database strategy.The proposed method is an ensemble learning method which combines deep learning strategy and deep decision tree.The model employs Boosting scheme as main framework,and it is a kind of two layer structure model which replace the single decision tree with random depth decision forests in the traditional AdaBoost algorithm.The experimental results from Letter Recognition Database on machine learning repository and FMD database validated that the proposed random depth decision forest Boosting model is feasible and competitive.
Keywords/Search Tags:pattern classification, random forest, enasemble learning, visual data, decsision tree
PDF Full Text Request
Related items