Font Size: a A A

Study And Application Of Frequent Pattern And Multi-modalities Data Clustering Algorithm

Posted on:2007-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X J YangFull Text:PDF
GTID:2178360212957158Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper is based on the project of Decision Support System of Criminal Trail. This article focuses on two problems which appear in the system. The first study is about the incremental process of association rules. The other one is the clustering algorithm based on different types of data.In the course of a criminal trial, the databases of criminal case are ever-changing following the time as well as the rules and patterns which are produced by association rules. Some existing frequent rules may not useful any more while some others which are not frequent in the past will become frequent. To address this issue, we use the incremental mining to settle down. However, the existing methods such as FUP (Fast Update), IM (Incremental Maintenance) have low efficiency and can not meet the demand of the system. This paper presents an algorithm based on frequent pattern list. It can handle change of minimum support as well as increasing database. Compared with IM, its efficiency increases averagely 2-5 times. Finally, public data sets and criminal trial database are used to test the algorithm. Comparing with the time of FUP and IM algorithm, experiments show that efficiency is good.For the demand of the project, we need to search the similar cases for current case. But there are a huge number of candidate similar cases. So the bound should be narrowed down. Clustering was taken into charge to do this job. Then similar cases are found just within one cluster. It will save a lot of time and increase the efficiency. However, the existing clustering algorithms can not meet the demand. All of them can just deal with data with one type. There is a special type of data with multimodalities, including numerical data, text. Based on this special kind of data, we proposed one model of clustering to handle this problem. Normalized Euclid distance and cosine similarity are engaged to calculate the distance among different objects which have two parts (relational data and text data). At last this paper presents a algorithm named CK-Means is based on K-Means. From many experiments, we can conclude this algorithm has better performance than single data modality.
Keywords/Search Tags:Frequent Pattern, Incremental Mining, Clustering of Multiple-modalities, Frequent Pattern List Algorithm
PDF Full Text Request
Related items