Frequent item-based text clustering

Posted on:2004-09-25

Degree:M.Sc

Type:Thesis

University:Simon Fraser University (Canada)

Candidate:Afshar, Homayoun

Full Text:PDF

GTID:2468390011968951

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

The volume of information available on the Internet is increasing rapidly and most of this information is in the text format; e.g. HTML files, emails, newsgroup postings. Grouping similar information together makes it easier and faster to view and find the relevant information. Clustering methods are introduced to do this task. Most of the current clustering methods use a distance function to compare the similarity between the data items in which they are clustering and group the ones that are close, more similar, together. Text data sets have the following two properties, high dimensionality and large size of the dataset.; We used the notion of frequent item sets to create a clustering algorithm; FIT-clustering, Frequent Item-based Text Clustering; suitable for clustering the text dataset, which addresses the properties mentioned earlier and also outperforms the earlier clustering methods in the clustering quality. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Clustering, Text, Frequent, Information

PDF Full Text Request

Related items

1	Message Text Clustering Based On Frequent Patterns
2	Text Clustering Method Based On Frequent Itemsets
3	Research On Distributed Text Clustering Based On Frequent Item Set
4	Research On Web Text Clustering And Retrieval Technology
5	The Research Of Text Clustering Based On Frequent Selected Word Set
6	Search Results Clustering Method Based On Maximal Frequent Itemsets
7	Research On Text Clustering Algorithm Based On 2 Degree Frequent Word Sequence
8	Research On Coverless Text Information Hiding Based On Frequent Words In Text Sets
9	Research And Design Of Malicious Information Monitor Strategy Management System
10	Study On Management Of Text Documents Based Content In Dataspace