Parallel K-means Clustering Method And Its Resume Data Applied Research

Posted on:2011-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:L N Feng

Full Text:PDF

GTID:2208360308981342

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of various fields'technology, the amounts of data increase dramatically. Under this situation, traditional K-Means clustering algorithm of data mining is facing challenge. Therefore, the research how to improve efficiency of traditional K-Means clustering algorithm is helpful for a better understanding and a good usefulness of data. At present, online recruitment has become a popular way for most of unit, but it's inefficient because of a large number of resumes, which seriously wastes of human resources. While it's helpful to using K-Means parallel clustering algorithm for CV (curriculum vitae) data's process, and it can help employers recruit the appropriate people more quickly.The work of this thesis includes three aspects:Firstly, In order to improve the efficiency of traditional K-Means clustering algorithm, this thesis proposes a clustering algorithm which can reduce the communication and computation to some extent. And the algorithm has been realized through using Master / Slave model in MPI environment. Then some evaluation criteria like algorithm complexity and speed up has been given. Some experiments compared with the traditional K-Means algorithm have been done. The results show that the parallel K-Means algorithm is correct and effective.Secondly, the thesis analyzes the characteristics of the CV data and studies the method of feature extraction. And then process the CV data using the traditional K-Means clustering algorithm in order to verify the validity of the proposed features. The results of experiments show that the extracted features can effectively reflect the CV information and the application of cluster analysis do quickly tap into representative CV information.Finally, the proposed parallel K-Means clustering algorithm in this thesis is applied to the process of the CV data. And do some comparison with the traditional K-Means clustering algorithm. The results show the accuracy and effectiveness of the parallel K-Means clustering algorithm to process the CV data.In conclusion, the thesis starts on the practical problems, and do some study of the traditional K-Means clustering algorithm from both theoretical analysis and practical application. What we do improves the efficiency of the algorithm, and expands applications of the algorithm.

Keywords/Search Tags:

Data Ming, cluster, K-Means, parallel computing, MPI, CV data

PDF Full Text Request

Related items

1	Parallel Data Mining Theory Research And Application
2	Parallel Processing Technology Research And Application Based On The Cluster Of Massive Remote Sensing Data
3	Study On Novel Approaches Of Data Ming For Earthquake Prediction
4	E-commerce Applications Based On Multi-core Cluster Parallelization
5	Study On Data Partition DBSCAN Using Genetic Algorithm
6	Theoretical And Applied Research On Fuzzy C-means Clusteirng And Its Cluster Validation
7	The Architecture Design And Implementation Of Cluster-Based Parallel Processing System For Remote Sensing Data
8	Parallel Optimization Of Data Intensive Computing On Sunway TaihuLight
9	The Research On Parallel Computing Technology In Precise Agricultural Climate Division
10	Research On Parallelization Of Data Mining Algorithm Based On Distributed Platforms Spark And YARN