Study And Application Of Frequent Pattern And Multi-modalities Data Clustering Algorithm

Posted on:2007-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:X J Yang

Full Text:PDF

GTID:2178360212957158

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

This paper is based on the project of Decision Support System of Criminal Trail. This article focuses on two problems which appear in the system. The first study is about the incremental process of association rules. The other one is the clustering algorithm based on different types of data.In the course of a criminal trial, the databases of criminal case are ever-changing following the time as well as the rules and patterns which are produced by association rules. Some existing frequent rules may not useful any more while some others which are not frequent in the past will become frequent. To address this issue, we use the incremental mining to settle down. However, the existing methods such as FUP (Fast Update), IM (Incremental Maintenance) have low efficiency and can not meet the demand of the system. This paper presents an algorithm based on frequent pattern list. It can handle change of minimum support as well as increasing database. Compared with IM, its efficiency increases averagely 2-5 times. Finally, public data sets and criminal trial database are used to test the algorithm. Comparing with the time of FUP and IM algorithm, experiments show that efficiency is good.For the demand of the project, we need to search the similar cases for current case. But there are a huge number of candidate similar cases. So the bound should be narrowed down. Clustering was taken into charge to do this job. Then similar cases are found just within one cluster. It will save a lot of time and increase the efficiency. However, the existing clustering algorithms can not meet the demand. All of them can just deal with data with one type. There is a special type of data with multimodalities, including numerical data, text. Based on this special kind of data, we proposed one model of clustering to handle this problem. Normalized Euclid distance and cosine similarity are engaged to calculate the distance among different objects which have two parts (relational data and text data). At last this paper presents a algorithm named CK-Means is based on K-Means. From many experiments, we can conclude this algorithm has better performance than single data modality.

Keywords/Search Tags:

Frequent Pattern, Incremental Mining, Clustering of Multiple-modalities, Frequent Pattern List Algorithm

PDF Full Text Request

Related items

1	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
2	A Study On Algorithms Of Weighted Frequent Pattern Mining
3	The Research On The Related Problems Of Association Rule Mining
4	Constraint-Based Frequent Pattern Mining:Novel Applications And New Techniques
5	Research And Application Of Frequent Pattern Mining Algorithm Based On Tissue-like P System
6	Researches On Algorithms For Mining Top-K Frequent Patterns
7	The Techniques Research On Frequent Pattern Mining
8	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
9	Research On Website Optimization Strategy Based On Frequent Pattern Mining
10	The Research And Implementation Of An XML Document Structural Clustering Algorithm Using Frequent Path Pattern