Font Size: a A A

Research On Parallel Mining Algorithm Of Space Co - Location Based On Hadoop

Posted on:2016-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhangFull Text:PDF
GTID:2208330470956132Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The traditional co-location pattern mining algorithm can only mine the prevalence co-location patterns, namely the instances of spatial features associate frequently in space. However, during the mining of spatial co-location patterns, there are some patterns which are not up to a given threshold but have strong negative correlation, which is negative co-location pattern. They may contain valuable information and their influence on making decision cannot be ignored. Therefore, researchers shift their focus on the study of negative co-location patterns, and propose the algorithm of mining negative co-location patterns on the basis on the mining of prevalence co-location patterns. Because the algorithm of mining negative association rules and prevalence co-location patterns cannot be simply applied to the mining of negative co-location patterns, and the number of negative patterns derived from spatial data sets will be huge and the mining process will be time-consuming, so the mining of negative patterns is very difficult.The MapReduce parallel computing framework based on the Hadoop platform provides an excellent solution to the mining of negative co-location patterns, with the features of its strong parallel processing function, robustness, extendibility, and accessibility of open source. Therefore, this paper proposes a mining algorithm based on parallel computing, which can mine the prevalence co-location patterns interesting negative co-location patterns simultaneously from spatial data sets. Experiments show that the parallel algorithm can solve the problems we mentioned above, and have a good effect on the efficiency of space and time. The contents are as follows:Firstly, this paper analyzes the research status of co-location patterns mining, and introduces the research contents and achievements and the related concept of spatial prevalence and negative co-location patterns mining.Secondly, this paper analyzes the significance and the difficulty of the mining of negative co-location patterns, and points out the reason for proposing the parallel mining algorithm.Thirdly, this paper describes the parallel mining algorithm of mining spatial prevalence co-location patterns and interesting negative co-location patterns, and introduces the data partition and distribution algorithms in the mining process, and then conducts a deep theoretical analysis about the correctness and the complexity of time and space of the parallel algorithm.Then, the experiments on real data sets have been made, and the effects of different parameters on the performance of parallel algorithm are also analyzed.Finally, this paper summarizes the main work briefly, and discusses the content and direction of study in the future.
Keywords/Search Tags:Prevalence co-location pattern, Negative co-location pattern, MapReduce, Parallel mining, Data partition
PDF Full Text Request
Related items