Over the past 30 years,data has increased on a large scale in various field.Such massive amounts of data may have potential correlations or patterns,which can be mined for more interesting knowledge.However,it can also pose a threat to privacy while analysts mine data without any restrictions,which will inevitably lead to serious privacy problems.Data aggregation with local differential privacy(LDP)guarantees strong and fine-grained privacy for users by providing plausible deniability of sensitive data that has a broad application prospect.This thesis focuses on the privacy-preserving mining frequent patterns task in the context of LDP,and the main work is as follows:(1)This thesis proposes a frequent pattern tree(FP-Tree)based approach to discover frequent itemsets in the set-valued data setting,which has good extensibility and availability.It is the first time to construct one noise FP-Tree in a breadth-first way while satifying LDP,and the designed pruning algorithm of tree nodes can effectively reduce the scale of injected noise.Then,two post-processing optimizations,named imposing consistency and weighted combination,are designed to identify consistent itemsets.The former exploits consistency among nodes so that they will not break constraints.And the latter increases effectiveness of all published results.Finally,the strong privacy and high usability of the proposed approach are demonstrated by the experimental analysis with the leading SVSM approach as well as the theoretical analysis and proof.(2)This thesis proposes an extended FP-Tree based top-k query approach in the context of LDP,which can be applied to multiset data scenarios to collect frequent high-utility items.Unlike the set-valued data scenarios,which implicitly assumes that each item has equal value or utility,multiset data marks the utility attributes of items and assumes that the utility varies between items.The proposed approach constructs the noise FP-Tree by using key-value data collection protocol,i.e.,PCKV,and optimizes it to solve the problem of private storage and query over multiset data.Finally,taking refined O-PCKV as the comparison approach,the performance differences of two approaches are compared in the real dataset experiments. |