| With the enhancement of smart phone function, we can not only send e-mail, browse the web but also pay online. Some function of smart phone has gradually replaced the computer and the smart phone has become an indispensable part of our daily lives. As people increasingly rely on smart phones, smart phone security issues have become more and more important. The increasing number of smart phones and variety of infection methods make security protection more difficult and the traditional method to upgrade the virus database has been unable to meet the requirements. In this paper, we use clustering algorithm combined with Spark distributed computing platform to analyze and model virus data to improve the efficiency of mobile phone virus mining.In this paper, clustering analysis of smart phone viruses is carried out by using the affinity propagation clustering algorithm, and distributed processing and incremental processing are realized combining Spark distributed computing platform. The main work includes: Firstly, through the investigation and analysis of clustering algorithm and the application of distributed parallel technology and some problems in smart phone virus clustering, affinity propagation clustering algorithm based on Spark is designed and implemented combined with the characteristics of affinity propagation clustering and large data processing that has improved the storage of the data and is suitable for large number of processing. At the same time the algorithm improved the efficiency of mining through the Spark GraphX. Secondly, the incremental affinity propagation clustering algorithm based on Spark is realized to deal with dynamic data clustering.The new data is compared with the initial modeling data and assigned using the nearest neighbor idea. The distributed graph is expanded to complete the iterative update of the clustering model. Thirdly, the architecture design of the clustering subsystem of virus is completed,including database design, data preprocessing, modeling module and incremental modeling module. The system and algorithm performance test are finished. The feasibility of the two algorithms in mobile phone virus mining is verified through the experiment. Compared with the K-means algorithm, the affinity propagation algorithm can improve the accuracy of the clustering algorithm and distinguish the similar virus better.Through the above work, the design and implementation of the distributed mobile virus clustering subsystem is completed .The system can process the virus data of the smart phone in parallel, and we have no need to specify the number of clusters and cluster centers manually. At the same time, the flexibility of the system is improved. The system can also be applied to different application scenarios according to the characteristics of the algorithm combined with other clustering algorithms,so as to provide a new solution for clustering mining of mobile virus data.Therefore this system has important application prospects in the field of mobile phone virus mining. |