Font Size: a A A

Design And Research On Differentially Private Frequent Sequence Mining

Posted on:2016-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2298330467993125Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology, how to discover the useful knowledge contained in the ocean of information in time has becoming the urgent issues to tackle for each of us. Frequent sequential pattern mining technology aimed at discovering sequences which are appeared at a high frequency in sequential data (time sequential data or spatial sequential data and so on), and treating these sequences as new knowledge patterns. Although the frequent sequential pattern mining technology has becoming an effective way of knowledge discovery, however, release the frequent sequences directly may cause data privacy leak in its process. To solve the above problems, we propose a differentially private frequent sequence mining (FSM) algorithm——PFS (differentially Private Frequent.Sequences mining algorithm). In PFS, to address the problems brought by the may existed long records in databases, we design three useful coping strategies:database sampling, transaction length reduction, and threshold decrease. Through the application of these three strategies, the PFS algorithm effectively can control the amount of noisy required by differentially privacy, and thus provide high data utility and data privacy simultaneously. Experimental results illustrate that the PFS algorithm substantially outperforms the state-of-the-art techniques. Meanwhile, to demonstrate the transaction truncating strategy raised in the record length limitationmethod can be widely used, we apply the transaction truncatingmechanism to the mining of frequent itemsets, and the proposed differentially private frequent itemset algorithm DAT (Differentially private algorithm Apriori based of transaction Truncating) also has a good performance.
Keywords/Search Tags:data mining, frequent sequence, differential privacy, database sampling, length reduction, threshold decrease
PDF Full Text Request
Related items