Study of document clustering using the k-means algorithm
Posted on:2007-03-14
Degree:M.S
Type:Thesis
University:University of Nevada, Las Vegas
Candidate:Gummuluru, Meghna Sharma
Full Text:PDF
GTID:2448390005974884
Subject:Computer Science
Abstract/Summary:
One of the most commonly used data mining techniques is document clustering or unsupervised document classification which deals with the grouping of documents based on some document similarity function.;This thesis deals with research issues associated with categorizing documents using the k-means clustering algorithm which groups objects into K number of groups based on document representations and similarities.;The proposed hypothesis of this thesis is to prove that unsupervised clustering of a set of documents produces similar results to that of their supervised categorization.