Font Size: a A A

The Increment Clustering Algorithm In Financial Data Mining And Its Application And Research

Posted on:2005-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L SunFull Text:PDF
GTID:2168360152469262Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis in data mining deploys many traditional methods. All these methods have not been considered large volume data sets. However, to efficiently obtain knowledge from large amount of data sets is the top-leading problem in financial data mining area. In addition, traditional clustering analysis has mainly focused on numeric data rather than other types of data that exists in financial field. The difficulty to understand the output of clustering is a problem of traditional clustering analysis methods. Therefore, clustering analysis in detecting money laundering aims at improving efficiency of algorithm and ability of processing variant types of data such as document, categorical data etc. and giving the conceptual explanation to the result of clustering.The SAFE-MIDSS (State Administration of Foreign Exchange-Management Information & Decision Support System)has set to research the application of data mining technology in detecting money laundering system. The fundamental issue in financial data mining is to divide large volume data sets into meaningful subset effectively. Since the course of data collection is irregular, as well as the gradual development of market and the lagging of manage system, financial data mining must deal with all the incomplete data with complex attribute, noise and isolated points in the circumstances of lacking background knowledge. Traditional BIRCH algorithm suits for large volume data set due to its characteristic of increment. However, the algorithm could not deal with categorical data by its Summary Clustering method. Although K-means algorithm can deal with categorical data, the high price of computing makes it difficult to be applied to large data set. Basing on the K-means center points algorithm and the BIRCH increment algorithm, the author poses the concept of Core-Tree which could make up the weakness of these two algorithms, That is, using center point to indicate the summary information in BIRCH, and using class core to improve the efficiency of center point orientation. Meanwhile, applying the method based on conceptual model to the data output of the clustering could make the result easy to understand, which contributes to improving the quality of output. Eventually, the author brings forward the project of applying the core-tree algorithm to SAFE-MIDSS, as well as proves the algorithm can reach the prospective purpose.
Keywords/Search Tags:increment clustering, conceptual clustering, outlier, financial data mining, money laundering
PDF Full Text Request
Related items