Font Size: a A A

Bird call recognition with artificial neural networks, support vector machines, and kernel density estimation

Posted on:2007-10-22Degree:M.ScType:Thesis
University:University of Manitoba (Canada)Candidate:Ross, Derek JFull Text:PDF
GTID:2458390005487494Subject:Engineering
Abstract/Summary:
This thesis evaluates artificial neural networks (ANNs), support vector machines (SVMs), and kernel density estimation of probability (KDE) on the task of classifying ten species of birds from audio recordings of their calls.; This project had two primary goals. The first goal was to determine if short-term tonal qualities are adequate for distinguishing bird species. Past research into bird recognition has concentrated on long-term or global characteristics of bird calls, as opposed to short-term qualities.; The second goal was to compare the performance of the three aforementioned pattern recognition algorithms. ANNs have been used for bird recognition in past research, but SVMs and KDE have not been studied in this context.; Recordings were first processed to extract short-term features based on spectral, cepstral, and amplitude characterstics---global features were ignored. Consideration was given to features that would be more resistant to environmental noise.; Three classifiers were trained to recognize a species based on audio recordings that had been separated into frames of 512 samples each. With ANN and SVM, silence and noise frames were rejected by setting a high discrimination threshold, which was determined by finding the optimal point on the receiver operating characteristics (ROC) curve. A discrimination threshold proved problematic with the KDE classifier and was not used.; Recordings from the cross-validation (CV) set were tested by classifying each of the frames as a species, and then processing the collection of votes to determine the likely species of the recording. Two postprocessing methods were used.; The first method, simple voting, counted the number of times each species was selected by a classifier. The species which was most frequently selected was considered to be the winner, and became the species estimate for the entire call. The second method used the chi-squared goodness-of-fit test to match the "confusion row" for a recording to a row in the overall confusion matrix. The row with the lowest chi2 determined the species.; Both methods gave similar average accuracy results, but the chi-test raised the score of the worst performing species, in some cases, by significant amounts, and also reduced the variance of accuracy across species. The best average accuracy on the CV set was exhibited by an ANN with 100 hidden neurons, with a score of 82% and an accuracy floor of 46%. A figure of merit consisting of the geometric mean of the average CV accuracy and the CV accuracy floor was used to better evaluate performance. Using this metric, one of the three SVM implementations was the best, with an average CV accuracy of 79°0 and a floor of 63%. KDE performance was comparable to an ANN with 20 hidden neurons.
Keywords/Search Tags:KDE, CV accuracy, ANN, Bird, Recognition, Species
Related items