Font Size: a A A

Field - Driven Space - Efficient Co - Location Pattern Mining

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:W G JiangFull Text:PDF
GTID:2278330488466900Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With high-speed development of mobile Internet, more and more data is associated with spatial location information and these data are relevant to our lives. The popularity of mobile devices makes the spatial data be produced and gotten more easily, and these spatial data is always huge and multi-dimension. In decade, the technology of spatial data storage is increasingly mature, and the knowledge discovery for huge amounts of spatial data gradually becomes a research hotspoto The online services, such as Alibaba, BaiduMap, DiDi and Meituan can produce petabytes of spatial data in every day. There exist the potential knowledge in those huge amounts of spatial data, which can help the enterprises to optimize their service and dig new business opportunities.The co-location pattern is a subset of spatial feature set, whose instances are frequently located together in some regions. For example, the malaria always happen in mosquito-breeding areas. Most previous studies take the prevalence of co-location patterns (PI) and the utility ratio of co-location patterns (PUR) as the interestingness measure. Those interestingness measures don’t take the differences between features and the diversity between instances belonging to same feature into full consideration, but there exist these differences in real world data. Thus, it is meaningful to take the utility value of each instance as the interestingness measure in spatial co-location pattern mining.Traditional data mining based on data-driven always pay more attention to the automation of mining process, as far as possible to reduce user participation, which usually obtain non-actionable knowledge containing a large number of useless information, some wrong conclusions and uninteresting knowledge. In actual knowledge discovery process, the domain constraint,expert’s experiences and the user preferences etc. are imported into data mining process, which can find the meaningful and user interested knowledge more quickly and obviously improve the quality of mining results.After analyzing the deficiency in traditional co-location patterns mining and traditional high utility co-location patterns mining, combining with the actual situation of data in the real world, we propose a more general study object- spatial instance with value. In order to evaluate the effect of the feature on co-location patterns, we define the IntraUR(intra utility ratio) and InterUR(inter utility ratio) to comprehensively evaluate the effect of features on patterns, and proper another novel interestingness measures-UPI(utility participation index) which effectively expands the traditional PI and take both utility and prevalence into account.We formalize the domain knowledge which may be used in co-location patterns mining into three semantic rules and put them into the mining process. Using the iterative mining method, the users can extract the new domain knowledge from the previous mining results and apply it to the next mining.Take the domain knowledge and new interestingness measure-UPI into consideration, we proposed the basic algorithm and some pruning strategies, and present the proof process of these pruning strategies. In order to evaluate the quality of the mining results, a quality evaluation index of the co-location pattern is proposed in this paper. We analyze the differences between prevalence and quality of mining results based on PI, PUR and UPI. The experimental results show that in the aspect of prevalence, the result based on UPI is higher than that based on PUR, but slightly smaller than that based on PI; in the aspect of quality of pattern, the result based on UPI is higher than that based on PUR and PI. Compared with the basic algorithm, the algorithm with pruning strategies is correct and efficient. And, importing the domain knowledge has an obvious effect on the number of mining results.
Keywords/Search Tags:Spatial data mining, Spatial co-location patterns, High utility co-location patterns mining, Interesting measure, Domain-driven
PDF Full Text Request
Related items