Font Size: a A A

Prediction Of Road Traffic Concentration Using Random Forest Algorithm Based On Feature Compatibility

Posted on:2022-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Ayana Aboma RegassaFull Text:PDF
GTID:2518306488451744Subject:APPLIED COMPUTER TECHNOLOGY
Abstract/Summary:PDF Full Text Request
Different algorithms of decision tree are commonly used as the base classifiers of random forest algorithm.In order to solve that the classifier is biased to select redundant features and contains a lot of feature space in random forest algorithm based on feature compatibility,an improved algorithm is proposed in this paper.In random forest algorithm based of feature compatibility,the problem that the classifier is biased to select redundant features and contains a lot of feature space is addressed.Considering the micro logical relationship and coordination correlation between features,feature compatibility of random forest is introduced.This proposed algorithm mainly uses feature ranking that includes feature selection for easing the number of input variables,which in turn useful in reducing computational cost and in moderate number of features improves the performance of the model.These ranking is based on,features with higher value weight is more important for classification and regression.After ranking correlation considers initial feature vector,entropy based measure for node splitting,the probability of class at that node and entropy of the node.Using correlation,we are able to identify the degree of coherence between each.Features are positively or negatively correlated between themselves and between the targets.By using the features that are not well correlated between themselves the feature with the largest negative correlation with other is selected for regression attribute to be used in the data set,then apply it.In this paper,extremely random forest is also introduced and implemented.Extremely random forests take randomness to the next level.Along with taking a random subset of features,the thresholds are chosen randomly as well.These randomly generated thresholds are chosen as the splitting rules,which reduce the variance of the model even further.Outliers are the values that escapes normality and probably cause anomalies in the results obtained through algorithms and analytical systems.The availability of outliers and the way to deal them is applied in the paper.Better evaluation methods of the models using cross validation is also applied.Lastly,UCI data set is used to verify the accuracy of the algorithms.The proposed algorithm has higher accuracy with average amount of attributes than traditional random forest algorithm and extremely random forest algorithm but higher training accuracy with equal number of attributes with random forest algorithm.Additionally the proposed algorithm has the overall shortest running time.
Keywords/Search Tags:Random Forest, Extremely Random Forest, Feature Compatibility, Base classifiers, Outliers, Hyperparameters, Cross-validation
PDF Full Text Request
Related items