Font Size: a A A

Optimization And Application Of The Equi-join Problem Based On Grid Big Data In Spark

Posted on:2017-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X J PiFull Text:PDF
GTID:2322330509954208Subject:Engineering
Abstract/Summary:PDF Full Text Request
Along with the increasing development of internet technology, among the E-commerce, scientific research, social platforms and other fields, the data scale and data category also grew rapidly, marking the time of big data has coming. In the field of grid, along with the development of internet of things and widely use of sensor, monitoring and collected data is also bigger and bigger. Facing a huge of monitoring data of grid, since the big data, diverse data category and time-efficient data processing, the traditional data processing technology can not meet the technical request any more. Against the analysis of grid big data, it is required the support of big data technology.As new and efficient big data computing framework, Spark provide rich components and API, supporting Spark Streaming, Graphx, MLlib and Spark SQL. During the process of statistical analysis of grid data, it involves the associated operation between tables. Spark takes the action of join to connect the tables, but many unmatched data participate in the shuffle operation, resulting in the inefficient of join.Against the inefficient operation of join and other practical applying problems of grid big data statistical analysis, the theme firstly put forward one new filtering and partition algorithm on the base of BloomFilter. Via this method, it can filter out those unmatched connecting data, then partition against data skew problem, taking good advantage of each joint's computing resource and optimizing the process of joint in the largest application.Finally, considering the business requirements of statistical analysis in the center of State Grid Chongqing power supply company, this article proposes the data processing model of power network combined with Spark and Spark SQL. Furthermore, through the integration with Web J2 EE technology, the system realizes the data collection, calculation, analysis and display.
Keywords/Search Tags:Spark, Spark SQL, Equi-join, Grid, Big data
PDF Full Text Request
Related items