| In recent years,foodborne disease occurred frequently,which make people pay more and more attention on it.Nowadays,foodborne disease has become one of the most important events in the world.It is likely to cause foodborne diseases when we have some unhealthy food in our daily life.In addition,for the view of current status,the food safety awareness and the related foodborne diseases monitoring system have not yet formed a standardized management and supervision.In the forecasting and analysis of foodborne disease,there still exist some problems in the inefficient data mining technology and unobvious visualization technology.For this case,the research of forecasting and analysis prediction method on foodborne diseases in this paper is mainly studying from the following three aspects.In the first place,a predictive analysis algorithm(Spark-IRF,Improved Random Forests algorithm based on Spark)has been proposed,it improved the original algorithm mainly from two parts on dimension reduction and weighted voting method.Then,we have an analysis of experiment data on accuracy,precision and recall.The results analyses on the experiment show that the Spark-IRF algorithm has the advantage over DRF(Dynamic Random Forests)algorithm and Spark-MLRF(Spark Machine Learning Random Forests)algorithm on accuracy and recall rate.In the next place,we proposed a clustering algorithm(WIK-means,The Weighted Intelligent K-means Algorithm),it solved the existing problems in IK-means(Intelligent K-means)algorithm,that is,Euclidean distance is used to calculate the distance between data points and the center of a cluster.And each feature variable is treated equally,so it may make two related attributes which are too far away or has the same importance be assigned to different clusters,thus result in data distortion.In the course of calculating the distance between data points and the center of a cluster,WIK-means algorithm assigns the corresponding weights to each feature variable to minimize the sum of the squared errors between entities and their respective centroids.In the end,we make a detailed analysis of K-means algorithm,IK-means algorithm,WK-means algorithm WK-means(Weighted K-means)and WIK-means algorithm by different centroids and different number of iterations.It turns out that WIK-means algorithm has relatively obvious advantages in time,space efficiency and accuracy.Finally,a forecasting and analysis system based on Spark for foodborne diseases(FASBSFD)has been designed and implemented,which uses the Random Forests algorithm as predictive analysis algorithm and WIK-means algorithm as clustering visualization.Ultimately,the results show that the prototype is feasible and effective for us. |