Font Size: a A A

Distributed Sequential Pattern Mining Algorithm Based On Privacy Preserving

Posted on:2009-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:P ChangFull Text:PDF
GTID:2178360275450865Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,a large amount of data generated from the industry application may stored in the distributed web sites. While mining sequential patterns from these data,some special requirements can not be applied to data transmission,so the original sequential pattern mining algorithm for stand-alone environment may no longer be valid.On the other hand,there is also the disclosure of sensitive information in the sequential pattern mining process,especially in the distributed environment.Current Distributed data mining and privacy preserving algorithm mainly concern with the association rules mining, the research of the privacy preserving for sequential pattern mining was lacking.As the result,research on privacy preserving based distributed sequential pattern mining algorithm has important theoretical and practical significance.Based on the researching of current sequential pattern mining and privacy preserving algorithm,considering the characteristics of the distributed environment,impove on PrefixSpan algorithm,and privacy preserving algorithm thought of association rule by using,the privacy preserving based distributed sequential pattern mining algorithm was proposed.Its main work are the followings:1.Study the typical sequential pattern mining and distributed data ming algorithm,analyse the characteristics of Prefixspan algorithm,the based on the characteristics of distributed computing,a distributed sequential pattern mining algorithm DSPM(Distributed sequential pattern ruing) was researched and proposed.Then the idea and flow of the alogirithm were detailed;2.For the highly cost of data transmission,the tasks can be implemented parallel and other characteristics in the distributed environment,DSPM was improved and some capability advance strategy are proposed.Finally they were used in the prototype system and the system performance was improved;3.Analyse the typical privacy preserving algorithm ideas of association rule,compare association rule mining with sequential pattern mining,then propose a privacy preserving of distributed sequential pattern mining algorithm CLSD(Current Least Sequences Delete).This algorithm according deleting original sequences to reduce the support of sensitive sequences to achieve hidden purpose;4.Based on DSPM and CLSD algorithms then using Java achieved a privacy preserving based distributed sequential pattern mining prototype system.This system uses serialization / deserialization,multi-threading and other technology,to further ensure it has a higher efficeiency.
Keywords/Search Tags:data mining, distributed data mining, sequential pattern, privacy preserving, sensitive knowledge
PDF Full Text Request
Related items