The arrival of Big Data makes people eager to mine the potential pattern and knowledge from the massive data. Big data combined with data mining techniques has brought a profound impact on business, medical, energy, transportation, security and entertainment, etc. However, the analysis of traditional data mining conducted on the original data faces the data privacy issues. Existing simple data processing methods cannot meet the demand of privacy protection, and the related laws and regulations have restricted the application and development of data mining techniques. Therefore, it is necessary to find technical solutions that can discover the knowledge from data while preserving the privacy.By combining the relevant data mining algorithms, the attribute oriented and relationship oriented privacy-preserving data mining issues are respectively studied. The sensitive information on the attributes of social entities leads to the attribute oriented privacy preserving problem. The topology structure of relationships between entities becomes another source of private information, which leads to the relationship oriented privacy preserving problem. The above research contents are divided into four research topics, and the main contributes and innovations are as follows:(1) Personalized privacy preservation: Existing algorithms on privacy-preserving frequent itemset mining did not consider the personalized privacy protection needs of different products or items. In reality, the customer does not care too much about the disclosure that he bought daily necessities. However, he will concern heavily about the privacy problem if he bought some sensitive items. Considering the personalized privacy requirements on different items, a new data distortion method based on the randomized response is proposed, which can take different levels of privacy protection on different items. The original itemset support can be reconstructed from the perturbed dataset by the proposed method, and indeed the frequent itemset in the original dataset can be recovered with the modified Apriori algorithm. By meeting the personalized privacy protection needs of different items, the proposed approach can reconstruct more frequent itemsets in the original dataset compared with related methods.(2) Privacy-preserving frequent patterns: Before sharing or publishing their data, some companies usually want to hide the competitive potential knowledge in the data. In association with the association rule mining, they want to hide some specific frequent patterns, called "restrictive patterns". To solve this privacy problem, the concept of "conflict degree of the item" is put forward, and a new heuristic algorithm is proposed to sanitize the original data. When hiding the restrictive patterns, this algorithm iteratively selects the item with the maximum conflict degree from all the transactions, and remove it from its transaction. This algorithm iteratively updates the conflict degree for each item, and applies the inverted file index to speed up the retrieving. Compared with related algorithms, the proposed algorithm can hide less non-restrictive patterns while hiding the restrictive patterns, and meanwhile reduce the modification of the original transaction data.(3) Distributed privacy-preserving data mining: The problem of horizontally partitioned privacy-preserving data mining is studied, in which different sites cooperate with one another to build a shared data mining model without disclosing their private data. By combining data randomization method and secure multi-party computation, a new hybrid method of privacy protection is proposed, which can maintain zero-loss of accuracy for the rotation invariant data mining. In addition, an efficient algorithm is proposed to achieve the relative maximum privacy level for the randomization method based on the random orthogonal transformation. Besides, a more efficient inner product protocol is proposed. For the collaborative filtering algorithm, a new privacy preserving method is designed under the horizontally partitioned case.(4) The relationship oriented privacy preservation: For the relationship oriented privacy preserving social network publication, the attacker’s background is modeled and an attack model based on the mutual friends is introduced. To address this issue, a new anonymity model is proposed, called k-NMF anonymity, and two algorithms are devised to anonymize the original network to satisfy the k-NMF anonymity. Since the proposed algorithms take into account the network topology when anonymizing the network, the experimental results show that the proposed algorithms can effectively retain structural characteristics of the original network while preserving the privacy. To make the social network satisfy the k-degree anonymity and k-NMF anonymity, a new anonymizing method is proposed, and the experimental results demonstrate that this method can almost maintain the structural information of the original network.The first three works solve problems on the attribute oriented privacy preserving data mining, which effectively reduced the information loss while meeting the customer demand for privacy protection. The last one focuses on the relationship oriented privacy preserving issues, which builds the privacy model and designs the corresponding anonymizing algorithms. The proposed methods effectively maintain the structural information of the original network while providing the privacy protection. Finally, some future extensions are presented. |