| Nowadays,research on big data has been carried out in various fields,especially its application in the medical field has become a hot issue in medical research.With the rapid development of big data technology,the value of medical big data is self-evident,and the privacy protection of patients in the application of medical big data has become particularly important.In the sharing and publishing of medical big data,it is necessary to ensure that the data has use value,and at the same time ensure that the data does not expose patients’ privacy.For both the above two points,the application of K-anonymous methods,the method of specific measures is to use a generalization or inhibit to the raw data into a capacity at least transform K equivalence class,let the data in the equivalence class and group K-1 the record could not distinguish,although this method at the expense of the part of the use value of information,but can protect the patient’s personal information was leaked.However,the classic K-anonymity algorithm has some problems and needs to be improved to make it suitable for the privacy protection of medical big data.Two key problems to be solved in medical data anonymization are to reduce information loss and time cost while ensuring data security.However,the current traditional K-anonymous algorithm has the following problems when dealing with the mixed data set of medical data.Firstly,the commonly used K-anonymous model has the problem of over-generalization.In this way,although the over-generalization data can protect the privacy of patients,the practicality of the data will be greatly reduced;Secondly,among the current K-anonymous models,the micro-aggregation-based K-anonymous model has the lowest information loss,but the micro-aggregation-based k-anonymous model is only suitable for numerical data sets,not for mixed data sets such as medical big data;Finally,considering the timeliness of medical data release,the generalization time of the model should be reduced as much as possible during k-anonymous data processing.To solve the above problems,this paper found a micro-aggregation anonymous algorithm suitable for medical big data--k-TBM(Transformation Based Method for Microaggregation of Mixed Data)algorithm.Experimental results show that the information loss in the anonymous medical data set of k-TBM algorithm is lower than that in the anonymous Adult data set.Therefore,this algorithm is a more suitable privacy protection method based on micro-aggregation for medical big data.Compared with the global k-anonymous algorithm Datafly,the locally generalized multi-tree forest K-anonymous algorithm,the MDAV micro-aggregation algorithm and k-TBM algorithm,the experiment shows that the micro-aggregation K-TBM algorithm is better than the medical data privacy protection algorithm.Experimental results show that the time cost of the anonymous data table is affected by the size of the data set,and the larger the data set,the higher the time cost.In order to reduce the time cost of anonymity,this paper introduces the Clustering by Fast Search and find of Density Peaks(CFSFDP)algorithm to improve the K-TBM anonymous algorithm.A K-TBM anonymous model based on CFSFDP is proposed.Experiments show that the K-TBM anonymous model based on CFSFDP can greatly reduce the time cost and further reduce the information loss of the anonymous data table.Therefore,THE K-TBM anonymous model based on CFSFDP is a low time cost model suitable for the privacy protection of medical big data. |