Font Size: a A A

Region-based Spatial-textual Similarity Join In Mapreduce

Posted on:2016-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:J N ZhangFull Text:PDF
GTID:2308330479951007Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of portable mobile devices, location-based geographic information services are more and more important in people’s lives. With the user’s needs, region-based spatial-textual data appears, and region-based spatial-textual similarity join becomes an important operation, and it is widely used in various applications in real life, such as social recommendations. However, with the increasing volume of data, it is difficult to perform this operation on large-scale data by using a centralized machine effectively. While existing related works concerning Map Reduce framework do not taking into account the user’s space and text messages simultaneously. Based on this, region-based spatial-textual similarity join by using Map Reduce framework is studied in this paper.Firstly, we propose a method to perform region-based spatial-textual similarity join with threshold constraint operation using Map Reduce framework, which solves the problem that we can not perform this operation effectively on a centralized machine for large-scale data. This method consists of two phases: the global ordering for textual signature is generated in the first stage and the similarity join is performed in the second stage. We develop a data partitioning strategy based on M-restrict-rectangle to reduce the size of data replication, and hence it not only reduces the computation on each node, but also prunes part of dissimilar object pairs. And we propose a grid-based duplication avoidance strategy to avoid repeated computation of similar object pairs.secondly, we propose a method to perform region-based spatial-textual top-k join with ranking score operation using Map Reduce framework, which solves the issue that unsuitable threshold may lead to the connection results too much or too little. This method consists of three phases: the global ordering for textual signature is generated in the first stage and the local top-k similarity join is performed in the second stage, the top-k similarity join results are obtained in the third stage. We develop an early termination strategy to reduce the unnecessary computation.Finally, we test the methods using random data-set to verify the effective of our proposed methods, and the experimental results prove it is so.
Keywords/Search Tags:spatial-textual similarity join, Map Reduce, M-restrict-rectangle, avoid redundancy
PDF Full Text Request
Related items