Font Size: a A A

A computational machine for optimal parsimonious hybrid models of e-text classification

Posted on:2005-07-18Degree:Ph.DType:Thesis
University:George Mason UniversityCandidate:Alduhaiman, Khaled SFull Text:PDF
GTID:2458390008997934Subject:Computer Science
Abstract/Summary:
E-Text classification is a critical problem in many applied areas, such as e-mails and information retrieval. Recently, there has been an explosion of research on using combinations of models to reduce errors in data classification. For instance, users of electronic communications are bombarded with information, much of it undesirable. The Internet is supposed to help users access and receive information---but there is no satisfactory way to filter out undesirable information. Handling this information overload is becoming an increasingly challenging problem. One approach to partially resolve this problem is to design an optimal combination of classifiers to filter in the desirable data. Recently, this approach has been applied to e-text classification.; In this thesis. I built a computational parsimonious machine for e-text classification using S-Plus. This machine employs seven methodologies, including neural networks and three statistical approaches---naive Bayes, k-nearest neighbor, and support vector machine---to differentiate between desirable and undesirable data sets, including e-text. My analysis demonstrated that optimality in parsimonious hybrid model building is data dependent and therefore a unique global optimal model does not exist. Second, my new methodology selects the best model (i.e. the model with the lowest combined error) based on the data set of interest. Furthermore, I analyzed the importance of a guidance parameter (in my measure of error) and concluded that its values crucially impact the selection of the optimal parsimonious hybrid model.; I demonstrated that the popular conjecture that including similar classifiers among a larger list of classifiers to determine optimal hybrid models may reduce the accuracy of the hybrid classifiers is unfounded. In fact, my analysis indicates that including similar classifier may increase the accuracy of the optimal hybrid model.
Keywords/Search Tags:Hybrid model, Optimal, Classification, E-text, Machine, Information, Including
Related items