We utilized data mining techniques to rapidly extract valuable rule and pattern from massive data with noise, thus makes data easily being understood and utilized. On the other hand, based on the characteristics of low cost, high throughput, good compatibility and stability, we selected cloud computing techniques to process the massive data.In this article, we first introduced the key technology of cloud computing, data mining and Hadoop architect. Then we optimized a typical categorized algorithm in data mining, SPRINT. After deeply analyzing the key programming pattern of cloud computing, MapReduce and encapsulating such pattern with the algorithm, we provided a detailed algorithm design and implementation. After that, we successfully transplanted the algorithm to Hadoop platform to proceed distributed computing. In the end, we analyzed the advantages and disadvantages of SPRINT with experimental verifications.The experiment shows that the execution time of a circulation unit is significantly decreasing as the number of nodes in cluster increasing and implies that the algorithm could better distribute the amount of calculation to various Hadoop clusters to implement paralyzation, thus improving the scalability and reducing the execution time. |