Concept, topic, and pattern discovery using clustering

Posted on:2006-03-12

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Chung, Seokkyung

Full Text:PDF

GTID:1458390008960609

Subject:Computer Science

Abstract/Summary:

In this dissertation, we present mining framework to extract useful pattern, concept, and topic from multi-dimensional dataset using clustering. In general, there are two kinds of datasets, incremental data and static data. Incremental data is the one where data items are inserted over time. However, not all datasets are incremental. In many cases, with static data, there is no incremental insertion. Thus, depending on the nature of data, relevant data mining algorithms should be developed. Thus, this dissertation is basically composed of two parts: incremental clustering for incremental data, and batch clustering for static data. For incremental data, we target news streams, and for static data, we target gene expression data.; In the first part, we propose a mining framework that supports the identification of useful patterns based on incremental data clustering. Given the popularity of Web news services, we focus our attention on news streams mining. A key challenging issue within news repository management is the high rate of document insertion. To address this problem, we present an incremental hierarchical document clustering algorithm using a neighborhood search. The novelty of the proposed algorithm is the ability to identify meaningful patterns (e.g., news events, and news topics) while reducing the amount of computations by maintaining cluster structure incrementally. In addition, we propose a topic ontology learning framework that utilizes the obtained document hierarchy. Experimental results demonstrate that the proposed clustering algorithm produces high-quality clusters, and a topic ontology provides interpretations of news topics at different levels of abstraction.; In the second part, we focus our attention on mining yeast cell cycle dataset. In molecular biology, a set of co-expressed genes tend to share a common biological function. Thus, it is essential to develop an effective clustering algorithm to identify the set of co-expressed genes. Toward this end, we propose genome-wide expression clustering based on a density-based approach. By addressing the strengths and limitations of previous density-based clustering approaches, we present a novel density clustering algorithm, which utilizes a neighborhood defined by k-nearest mutual neighbors. Experimental results indicate that the proposed method successfully identifies co-expressed and biologically meaningful gene clusters.

Keywords/Search Tags:

Clustering, Data, Topic, Using, Mining

Related items

1	Concept, topic, and pattern discovery using clustering
2	Topic Web Mining Algorithms Research And Application
3	Temporal Summerisation Based On The Event Topic Mining
4	News Review Topic Mining Based On Clustering And LDA
5	Research On Topic Clustering Algorithm Based On Topic Models
6	Research On Topic Web Crawler For Web Text Mining
7	Forum Based Topic Detection And Tracking Algorithms Study
8	Research On Short Text Topic Information Mining Technology
9	Stuctured Processing And Topic Mining Of Social Media Knowledge About Public Transportation
10	Research And Realization Of Topic Extraction Based On Text Mining