Font Size: a A A

Supervised and unsupervised machine learning for pattern recognition and time series prediction

Posted on:2009-06-13Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Bean, Kathryn BrendaFull Text:PDF
GTID:1448390005452108Subject:Statistics
Abstract/Summary:
The problem of empirical data modeling relates to many engineering applications, such as classification, prediction, and pattern recognition. In Chapter 1 I will introduce Machine Learning and Data Mining approaches from a Computer Science and Statistics perspective. I have developed a new clustering method DBBIRCH (Density Based BIRCH) that combines the features of density- and distance-based clustering algorithms. This method is described in Chapter 2 and is based upon (Bean K, 2007). My algorithm is an on-line type of algorithm and has a running time asymptotically equal to BIRCH under some realistic assumptions. To improve the accuracy of "distance-based" algorithms, robust statistics (trimmed mean) are used. The density-based feature of this algorithm is achieved by combining initial clusters into networks of density-connected clusters. DBBIRCH provides a fast and precise clustering method to mapping data points to their non-spherical clusters. My algorithm is easily modified to perform parallel clustering of large datasets using grid computing. My prototype program used breast cancer (UCI Machine Learning Repository) and synthetic datasets to support my conclusions.;I have developed a new framework to improve the performance of a partition-typed algorithm for the clustering of datasets with missing attributes. Chapter 3 describes this framework, and this approach is based on (Bean K., 2008). I have incorporated CLARA, PAM and K-means within a framework that remains general enough to allow other clustering algorithms to be used. Initial clustering is performed using a very fast algorithm: BIRCH. This approach was implemented to determine input parameters for a more accurate algorithm and to make the prediction of missing attributes more efficiently.;Using a neural network model for flood predictions is one of the most popular approaches. This technique, however, has a drawback related to the uncertainty of an optimal structure. I propose an algorithm for neural network pruning to create a Neural Network with Auto- and Cross-Correlation Models (NN-ACC). I believe this approach can determine the best neural network input. A forecasting framework for the presented NN-ACC model is constructed to perform calculations for a real-world case study (Derwent catchment of Upper Derwent). According to (Dunham M., 2004), NN-ACC gives a much better result than EMM and RLF.
Keywords/Search Tags:Machine learning, NN-ACC, Neural network, Algorithm
Related items