Font Size: a A A

Vehicle Logo Classification Using Support Vector Machine Ensemble

Posted on:2014-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Wesal Abdallah Mohammed AbdelrFull Text:PDF
GTID:2268330425972439Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Vehicle classification lately has attracted much scientific and commercial attention. It plays an important role in intelligent transportation systems and can be used for security in certain areas, such as government buildings, army camps or country borders also it can be used in traffic congestions, parking problems, traffic accidents etc. The vehicle can be classified by extracting information from its features; these features include vehicle shape, license plate, vehicle color, and model. Most of the vehicle classification systems use the license plate. But the license plate will be useless in case it is forged, missed or covered. Another important attribute of a vehicle is its logo which contains important information about the vehicle and as it cannot be tampered with easily, it plays an important role in classification of vehicles beside the decoration purpose.Therefore, this dissertation presents approaches for vehicles classification based on their logo, which is a graphic mark or image commonly used by vehicles manufacturer. The approaches we presented are based on Support Vector Machines (SVM) classification method. SVM method is based on machine learning approach that’s built on the basis of statistical techniques. It is meant with the problem of identifying to which of a set of categories or sub-populations a new observation belongs given a training set of data containing observations or instances whose category membership is known. It uses a set of features or parameters to characterize each object, where these features should be relevant to the task at hand. Under-samp ling approach and support vector machine ensemble are used to improve the classification accuracy.The data set we used contains four vehicle logo types. These types are Volkswagen, Hyundai, Nissan, and Toyota. Firstly, we started with standalone support vector machine, we used the two dimensional principal component (2DPCA) for features extraction from logo’s image. As opposed to conventional PCA,2DPCA is based on2D matrices rather than1D vector. That is, the image matrix does not need to be previously transformed into a vector. Instead, an image covariance matrix can be constructed directly using the original image matrices. The SVM uses the extracted features as input for vehicle logo classification. There are several methods that can be used to do multi-class classification using SVM, one of these methods is the one versus all method, which we used in our experiment. In one versus all a single classifier is trained per class to distinguish that class from all other classes. Prediction is then performed by predicting using each binary classifier, and choosing the prediction with the highest confidence score. We used LIBSVM which is a famous SVM tools.Feature scaling, it is also known as data normalization, is used as data preprocessing step to standardize the range of the independent variables or the features of the data. Normalizing features before passing them to SVM is very important since the range of values of raw data varies widely. Because SVM accomplish the classification by calculating a distance between two training points if one of the features has a broad range of values, the distance will be governed by this particular feature. Thus the advantage of scaling is to avoid numerical difficulties during the calculation. Because kernel values usually depend on the inner products of feature vectors. We used linearly scaling in which each value is scaled to a value in the range [0,1].The effectiveness of SVM depends on the selection of the kernel function from the four common kernels function. In general the RBF kernel is a reasonable first choice, so we select it. The RBF kernel function should transform the data in specific problem in such way that will be separable in the feature space. When using the RBF kernel there are some parameters that need to be optimized for the LIBSVM these parameters are C and γ. They are optimized using the grid search approach. Normally grid-search is recommended on C and y using cross-validation. Various pairs of (C,y) values are tried and the one with the best cross-validation accuracy is picked. Based on10fold cross validation the optimal values are found to be C=2.0for Volkswagen and Nissan vehicle types, C=4.0for Hyundai and Toyota vehicle types, and γ=0.03125for the all vehicle types. We implement SVMs on the datasets that contain each vehicle type logos using5cross validation approach. In5cross validation the dataset is divided into5samples. We adopted the leave-one-out cross-validation test. In a full leave-one-out cross-validation test of5samples, one sample is removed from the set, the training is done on the remaining4samples and the test is done on the removed sample. This process is repeated5times by removing one sample in turn. The final prediction results are taken as the average of the results from the5testing samples. Also we used k-means clustering algorithm with SVM to improve the accuracy classification. The K-means clustering algorithm partition a data into k group or cluster. Because we are using one-versus all method to accomplish multi-class classification using SVM for our vehicle logo data, we suggest that if we divided the negative class into three clusters, then the prediction accuracy of5-fold cross validation of each cluster with the whole positive sample will improve. We reasoned that because the whole positive sample with each cluster will form approximately balanced training set. At the beginning, the whole negative set is divided into three subset by a k-means clustering algorithm using original variable. Then the whole positive set with each cluster is used to build one SVM model, this will result in three SVMs model. The results of the three cluster models are aggregated using the average of estimated probability.Secondly to improve the classification performance of the standalone SVM, we used Under-sampling approach and SVM ensemble method with bagging. Since we used one versus all with SVM to class our car logo data into the four classes Volkswagen, Hyundai, Nissan and Toyota then the positive class will be much less than the negative one and this will make our car logo classification an imbalanced classification problem. The under-sampling technique is implemented by keeping all the positive class and randomly draws a subset of the negative class. This process of under-samp ling is repeated10times to create10training data sets. These training sets are used to train10SVMs classifiers independently. The trained SVMs are aggregated to make a collective decision. We used different percentages for the size of the subset that is drawn from the negative class (50%,75%, and85%). The result of under-samp ling using these percentages show that the under-sampling using85%from the negative value achieved high accuracy for all vehicle logo’s. The simulation results also show that Under-samp ling (85%) from the negative value achieved results that are better than the stand alone SVM for Hyundai and Nissan and worse than the stand alone SVM for Volkswagen and Toyota.Bagging technique which a repeatedly samples (with replacement) are taken from the original data is used to implement the SVM ensemble. In bagging each individual SVM is trained independently using randomly chosen training samples and then the results of the SVMs are aggregated to make a collective decision. In this experiment a random sample was drawn with replacement from the original data set to form a training set. Each training set contains approximately80%of the data from the original data set. Since LIBSVM support estimated probabilities, the average of estimated probabilities strategy has been used to make a collective decision.The experiment results show the SVM ensemble with bagging achieved results that are significantly better than both the stand alone SVM and the under-sampling (85%) from the negative value for all vehicle logo types. This indicates that, although SVMs are strong classifiers but in vehicle logo classification, their results can be improved significantly using ensemble method.
Keywords/Search Tags:Vehicle logo, Classification, Support Vector Machines, Under-sampling, Ensemble, Bagging
PDF Full Text Request
Related items