Prediction Of Road Traffic Concentration Using Random Forest Algorithm Based On Feature Compatibility

Posted on:2022-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:Ayana Aboma Regassa

Full Text:PDF

GTID:2518306488451744

Subject:APPLIED COMPUTER TECHNOLOGY

Abstract/Summary:

PDF Full Text Request

Different algorithms of decision tree are commonly used as the base classifiers of random forest algorithm.In order to solve that the classifier is biased to select redundant features and contains a lot of feature space in random forest algorithm based on feature compatibility,an improved algorithm is proposed in this paper.In random forest algorithm based of feature compatibility,the problem that the classifier is biased to select redundant features and contains a lot of feature space is addressed.Considering the micro logical relationship and coordination correlation between features,feature compatibility of random forest is introduced.This proposed algorithm mainly uses feature ranking that includes feature selection for easing the number of input variables,which in turn useful in reducing computational cost and in moderate number of features improves the performance of the model.These ranking is based on,features with higher value weight is more important for classification and regression.After ranking correlation considers initial feature vector,entropy based measure for node splitting,the probability of class at that node and entropy of the node.Using correlation,we are able to identify the degree of coherence between each.Features are positively or negatively correlated between themselves and between the targets.By using the features that are not well correlated between themselves the feature with the largest negative correlation with other is selected for regression attribute to be used in the data set,then apply it.In this paper,extremely random forest is also introduced and implemented.Extremely random forests take randomness to the next level.Along with taking a random subset of features,the thresholds are chosen randomly as well.These randomly generated thresholds are chosen as the splitting rules,which reduce the variance of the model even further.Outliers are the values that escapes normality and probably cause anomalies in the results obtained through algorithms and analytical systems.The availability of outliers and the way to deal them is applied in the paper.Better evaluation methods of the models using cross validation is also applied.Lastly,UCI data set is used to verify the accuracy of the algorithms.The proposed algorithm has higher accuracy with average amount of attributes than traditional random forest algorithm and extremely random forest algorithm but higher training accuracy with equal number of attributes with random forest algorithm.Additionally the proposed algorithm has the overall shortest running time.

Keywords/Search Tags:

Random Forest, Extremely Random Forest, Feature Compatibility, Base classifiers, Outliers, Hyperparameters, Cross-validation

PDF Full Text Request

Related items

1	Research On Random Forest Algorithm Based On Feature Selection And Diversity
2	Research On Multi-specification Cargo Loading Based On Improved Random Forest Algorithm
3	Detection Method Of DDoS Attack Based On Random Forest
4	Research On Detection Of Abnormal Mobile Communication Users Based On Improved Random Forest
5	Research On Feature Selection Method Based On Random Forest
6	Visual Interpretation And Analysis Of Random Forest
7	Optimization Of Distributed Random Forest Algorithm Based On Hierarchical Subspace
8	Semantic Segmentation Of Street Scene Based On Random Forest Algorithm
9	Research On IP City-level Geolocation Based On Random Forest
10	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets