Font Size: a A A

Object and relational clustering based on new robust estimators and genetic niching with applications to Web mining

Posted on:2000-11-06Degree:Ph.DType:Dissertation
University:University of Missouri - ColumbiaCandidate:Nasraoui, OlfaFull Text:PDF
GTID:1468390014964706Subject:Statistics
Abstract/Summary:
In this dissertation, we present new robust estimators that attempt to overcome the disadvantages of most existing robust estimation techniques. We also present new robust clustering algorithms based on these estimators, and a novel approach to unsupervised clustering based on genetic niching. The resulting clustering algorithms are applied successfully to mine user profiles from real Web access logs.; The Maximal Density Estimator technique (MDE) is a new linear complexity robust estimator that is free of any presuppositions about the contamination rate in noisy data sets. The Multivariate MDE (MMDE) generalizes MDE for multivariate data sets. MDE and MMDE are computationally attractive and quite insensitive to initialization. Our theoretical analysis shows that MDE and MMDE can be considered as new M-estimators that estimate both location and scale simultaneously, and that they can be expected to be sufficiently protected against very large outliers without compromising their efficiency. Based on MDE and MMDE, we present two new robust clustering algorithms, two unsupervised robust clustering procedures for the case when the number of clusters is unknown, and a new robust relational clustering algorithm that can deal with complex and subjective dissimilarity/similarity measures that are not restricted to be Euclidean.; We explore the use of genetic algorithms in robust clustering in several ways. We extend the objective function of the Least Median of Squares (LMedS) estimator so that it can simultaneously partition a given data set into C clusters, and design a genetic algorithm to search the solution space more efficiently. We also present a novel approach to unsupervised robust clustering, called Unsupervised Niche Clustering (UNC), based on genetic niching and an improved restricted mating scheme to alleviate the problem of crossover interaction between distinct niches.; We introduce a new approach to Web mining based on the extraction of different user profiles from very large amounts of semi-structured Web access log data. We define the notion of a "user session", and present a new subjective dissimilarity measure between two Web sessions. We apply our new robust relational clustering algorithm to extract typical robust session profiles that reflect distinct user interests from real server logs. We also present a hierarchical approach to clustering the Web sessions based on UNC (HUNC) which is computationally much simpler and can determine the number of clusters automatically. This approach offers the advantage of multi-resolution profiling.
Keywords/Search Tags:New robust, Clustering, Genetic niching, Web, Estimator, Present, MDE, Approach
Related items