Font Size: a A A

Parallel K-means Clustering Method And Its Resume Data Applied Research

Posted on:2011-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:L N FengFull Text:PDF
GTID:2208360308981342Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of various fields'technology, the amounts of data increase dramatically. Under this situation, traditional K-Means clustering algorithm of data mining is facing challenge. Therefore, the research how to improve efficiency of traditional K-Means clustering algorithm is helpful for a better understanding and a good usefulness of data. At present, online recruitment has become a popular way for most of unit, but it's inefficient because of a large number of resumes, which seriously wastes of human resources. While it's helpful to using K-Means parallel clustering algorithm for CV (curriculum vitae) data's process, and it can help employers recruit the appropriate people more quickly.The work of this thesis includes three aspects:Firstly, In order to improve the efficiency of traditional K-Means clustering algorithm, this thesis proposes a clustering algorithm which can reduce the communication and computation to some extent. And the algorithm has been realized through using Master / Slave model in MPI environment. Then some evaluation criteria like algorithm complexity and speed up has been given. Some experiments compared with the traditional K-Means algorithm have been done. The results show that the parallel K-Means algorithm is correct and effective.Secondly, the thesis analyzes the characteristics of the CV data and studies the method of feature extraction. And then process the CV data using the traditional K-Means clustering algorithm in order to verify the validity of the proposed features. The results of experiments show that the extracted features can effectively reflect the CV information and the application of cluster analysis do quickly tap into representative CV information.Finally, the proposed parallel K-Means clustering algorithm in this thesis is applied to the process of the CV data. And do some comparison with the traditional K-Means clustering algorithm. The results show the accuracy and effectiveness of the parallel K-Means clustering algorithm to process the CV data.In conclusion, the thesis starts on the practical problems, and do some study of the traditional K-Means clustering algorithm from both theoretical analysis and practical application. What we do improves the efficiency of the algorithm, and expands applications of the algorithm.
Keywords/Search Tags:Data Ming, cluster, K-Means, parallel computing, MPI, CV data
PDF Full Text Request
Related items