Font Size: a A A

Frequent item-based text clustering

Posted on:2004-09-25Degree:M.ScType:Thesis
University:Simon Fraser University (Canada)Candidate:Afshar, HomayounFull Text:PDF
GTID:2468390011968951Subject:Computer Science
Abstract/Summary:
The volume of information available on the Internet is increasing rapidly and most of this information is in the text format; e.g. HTML files, emails, newsgroup postings. Grouping similar information together makes it easier and faster to view and find the relevant information. Clustering methods are introduced to do this task. Most of the current clustering methods use a distance function to compare the similarity between the data items in which they are clustering and group the ones that are close, more similar, together. Text data sets have the following two properties, high dimensionality and large size of the dataset.; We used the notion of frequent item sets to create a clustering algorithm; FIT-clustering, Frequent Item-based Text Clustering; suitable for clustering the text dataset, which addresses the properties mentioned earlier and also outperforms the earlier clustering methods in the clustering quality. (Abstract shortened by UMI.)...
Keywords/Search Tags:Clustering, Text, Frequent, Information
Related items