Efficient representation of cluster structure in large data sets

Posted on:2002-10-23

Degree:Ph.D

Type:Thesis

University:Tufts University

Candidate:Kantabutra, Sanpawat

Full Text:PDF

GTID:2468390011492034

Subject:Computer Science

Abstract/Summary:

A clustering is a grouping of similar objects. Clustering has many applications including visualization, pattern recognition, learning theory, computer graphics and web search engines. In this thesis we introduce an innovative clustering algorithm called Ψ-clustering. While most clustering algorithms assume some form of data distribution for the input set in order for the algorithms to work effectively, Ψ-clustering does not make any assumption about the data. This is a great advantage because the data distribution is rarely known in practice. Ψ-clustering also provides greater flexibility by allowing users to choose the degree of similarity between objects that should place them in the same cluster. We also show that these clusters can be represented efficiently so that we can reconstruct them in reasonable time when the clusters are required in the future. We exhibit an algorithm that finds these representatives for clusters.

Keywords/Search Tags:

Data

Related items

1	Seismic Achievement Data ETL Platform Architecture Design And Software System Implementation
2	The Research And Application Of Data Preprocessing In XML Data Warehouse
3	Research On Related Issues Of Unstructured Data
4	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
5	Design And Implementation Of Data Mining Support Subsystem Based On Big Data Of Power
6	Design And Implementation Of Environmental Monitoring Data Management System
7	Research On The Problems And Countermeasures Of Domestic Data Journalism Practice
8	Study On Data Dependency_Based Data Quality Processing Techniques In Data Integration
9	Big Data And Research Of Big Data In Modern Internet Applications
10	Design And Implementation Of The Bayonet Data Integration Platform