The Clustering Technique Is Applied Research In The Auto Insurance Business Analysis

Posted on:2008-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:C Peng

Full Text:PDF

GTID:2208360215450222

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is a new technique, which has become increasingly popular in recent years. People can discover valuable rules behind the data that can support the science decision. Now data mining has become a subject, which involves lots of science domain and technology such as database, pattern recognition, neural network and computational intelligence etc.Firstly, this dissertation introduces the basic concepts, tasks, functions, applications and development way of data mining. We make a brief summary about the clustering analysis of data mining. We combine the data mining subjects of the car insurance business with the clustering ways, analyzing how the clustering algorithms are applied in the car insurance business. There are three data mining subjects for this business: customer segmentation, fraud recognition and customer behavior.This dissertation studies several initialization methods, including the random sampling, the distance optimization and the density estimation. A novel initialization method based on the hierarchy clustering algorithm is proposed. Compared with present initialization methods, this method could find the natural center for every cluster, and is sensitive to outliers or noise. This method is also fit for performing clustering initialization for large data set.K-means is one of the primary clustering algorithms which is one kind of partitioning algorithm. K-means selects K points randomly as the initial clustering centers. It converges by a iterative process. The output of K-means is especially sensitive to the initial points selected randomly at the start of the iterative process. So this way of selecting initial points may lead to unsatisfied result. This dissertation applies new initialization method to modify the original K-means algorithm. The modified K-means gets the sub sample from database which could represent the characteristic of the data set. It performs the hierarchical algorithm to get K clustering initial points, and finally it performs the K-means with the K initial points. The hierarchical algorithm won't cost too much time because it runs in a sub sample. It is much more effective for the initial clustering centers are so closing to the natural ones. K-means converges to local optimization by performing iterations. The time complexity would be expensive when it is applied to a large data set. This dissertation proposed a way based on the vectors'inner-product inequation to reduce the time complexity. It also modifies the original K-means algorithm with the vectors'inner-product inequation. The experimental results testify that modified K-means outperforms K-means, and it is effective and efficient.At last, a car insurance CRM prototype data mining system is presented and implemented according to the analysis of the car insurance business. The author has put the emphasis on the architecture design, function design, component design and data process of the system. It also shows the results of the function that has been implemented.

Keywords/Search Tags:

data mining, clustering analysis, K-means, hierarchical clustering

PDF Full Text Request

Related items

1	Research And Application Of New Methods In Symbolic Clustering
2	Scmi-superviscd K-means Clustering Algorithm In Data Mining
3	Research Of Improving For K-means Clustering Algorithm
4	Research On Dynamic Clustering And Incremental In Data Mining
5	Cluster Study Based On Functional Magnetic Resonance Imaging Data
6	Study Of Auto-Adaption Fuzzy C-Means Clustering Algorithm
7	Mining Analysis Of Mobile Phone Sales Customers Based On Partition And Hierarchical Clustering Method
8	New Non-hierarchical Clustering Objetives And The Algorithms To Optimal Clustering
9	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis
10	Research And Improvement Of K - Means Clustering Algorithm