Font Size: a A A

Research On Algorithm For Mining Distributed Closed Sequence Pattern

Posted on:2015-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:2298330431986346Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of network technology, more and more dataare stored in the distributed databases with universal application. How to discoverinteresting and available information from these massive and distributed stored datahas already become the important duty for current data mining.The current distributed sequential pattern mining technology in dealing withmassive and distributed stored data most has dedicated to getting the complete set ofsequential patterns; problems such as poor operating efficiency exist generally. In thepremise to maintain information completeness, compared to mine of distributedsequential patterns, mining distributed closed sequence patterns is more streamlined.Therefore, the distributed closed sequential pattern mining was researchedemphatically.According to the problem that the result set is too big and networkcommunication cost is high that the recent distributed sequential pattern miningalgorithms are faced with when they mine distributed storage and large–scalesequence data, moreover under the precondition of fully mastering the distributedsequential pattern mining technology and the closed sequential patternscharacteristics, firstly, the distributed algorithm for mining closed sequential patternsbased on improved sequence tree was proposed. The algorithm mined closedsequential patterns in distributed environment; its result set was small in size and noinformation redundancy. Meanwhile, the algorithm adopted a design mode ofmaster-slave structure, which master site and slave sites completed mining tasktogether collectively. As a result, it got lower communication cost and highparallelism.Secondly, closed detection sequence is necessary and important operating todistribute global closed sequential pattern mining. Sequence tree was improved inorder to assist the closed detection of sequences, in addition based on improvedsequence tree, with the characteristics of closed sequential pattern; a serial closed detection method based on improved sequence tree was put forward. This methodgreatly reduced the search space, and effectively avoided unnecessary inclusionrelation check between sequences.Finally, through the experiment the characteristic of the algorithm proposed inthis paper was analyzed. The experimental results showed that this algorithm hadbetter efficiency and effectiveness.All in all, firstly introduced the related concepts and technologies on distributed,closed sequential pattern mining, and then the distributed algorithm for miningclosed sequential patterns based on improved sequence tree was proposed, thensequence’s closed detection scheme on distributed environment was proposed, andfurther the sequence of closed detection method based on improved sequence treewas given. Finally, the experimental results verified the feasibility of the proposedalgorithm.
Keywords/Search Tags:Data mining, Distributed, Sequence pattern, Closed sequence pattern, Sequence tree
PDF Full Text Request
Related items