Font Size: a A A

A Parallel And Optimized Approach To Detecting Associations Among Large-scale Data Set

Posted on:2017-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2308330485492450Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, huge amout of data has been generated in various industries. However, raw data are usually disorganized, which leads to the poor use and low value density of data. It’s very necessary to find out the associations among data in order to further mine potential values and significance of data. Therefore, it has become an important issue to mine association among large-scale data set in the research of data integration and data mining. Specifically, dectection of spatio-temporal association is a hot topic in many research fields and can be applied in various applications. The main idea to solve such problem is to find out associations based on time and spatial dimensions among different data records. However, the huge amount of data becomes a critical challenge. This thesis aims to design a parallel approach to detect such associations among large-scale Automatic Number Plate Recognition (ANPR) data.Moreover, this thesis also put our approach into a real scenario about detecting fake plates. Based on deep analyses of previous research works, the thesis mainly put focuses on the definitions, principles and the parallel detecting algorithms of spatio-temporal associations. The main contributions include:1. Referring to the related works, we give the definitions and model of spatio-temporal contradiction associations. The principles about judging such associations are also proposed.2. With MapReduce framework, we propose an apporach, called as FP-Detector, to quickly detect spatio-temporal associations through parallel analyses on the historical large-scale ANPR data. This approach can obviously overcome the shortcomings of high costs and low efficiency of current detecting apporaches. In which, a blocking strategy based on Linear Partition theory is proposed to effectively sovle the issues of data skew and maintain balances among different computation nodes. This strategy can greatly improve the performance of detecting associations.3. Based on a real large-scale ANPR data set, we do several experiments to verify our approach. The experiment results show the effectiveness and efficiency of FP-Detector approach.4. We apply FP-Detector approach in a real scene of detecting fake plates in a large city. We implement a prototype system on our approach.
Keywords/Search Tags:ANPR data, spatio-temporal associations, blocking strategy, load balance, MapReduce
PDF Full Text Request
Related items