Font Size: a A A

Spectral methods for data clustering

Posted on:2003-12-15Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:He, XiaofengFull Text:PDF
GTID:2468390011485631Subject:Computer Science
Abstract/Summary:
With the exponential growth of digital information in general, and on the Internet in particular, there is great demand for developing efficient and effective methods for organizing and retrieving available information. Examples include image segmentation and scientific data sets such as genomics and climate data sets. With the rapid expansion of the Internet, web document clustering plays an increasingly important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing.; In the past decade, one of the most active research areas of data clustering methods has been spectral graph partition. My thesis focuses on developing the new spectral graph partitioning methods to discover new patterns and knowledge of large data sets, especially the data set of web documents and pure text data. The Min-Max (Mcut) criterion is proposed following the experiments and theoretical analysis of its better performance over other popular methods such as the normalized cut (Ncut) and K-means methods. The application of the normalized cut and Min-max cut on the bipartite graph is developed theoretically. The close relations between the spectral method and other information retrieval methods such as Principle Direction Divisive Partitioning (PDDP) and the K-means method are discussed. Correspondence analysis is investigated in great detail from the point of view of the spectral bi-clustering method.
Keywords/Search Tags:Spectral, Methods, Data, Information
Related items