Font Size: a A A

Design And Implementation Of The Tool For Data Analysis In Big Data Platform

Posted on:2016-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2308330503477803Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, each industry preserves and maintains the increasing large scale of data. Behind the big data, hidden the enormous commercial value. Compared to the growth rate of the data, the capability of big data analytics can’t meet the requirement. Dilemma appeared that a large data stock, but still on the lack of information. In the aspect of big data analysis, large-scale data integration and data mining are the two big problems to be faced with. The traditional methods of data integration can’t deal with the semantic conflicts which the heterogeneous data brings, and with the increasing amount of data, it’s efficiency will decrease remarkably. Due to the influence of single processing performance, the efficiency of the traditional data mining processing massive data is too low. The emergence of cloud computing platform provides a good thinking and methods to solve this situation.This subject is based on Hadoop platform, fully explored the the research status and Deficiencies of the traditional data integration solution and data mining algorithm in the aspect of processing big data, design and implement a data analysis tool based on the big data platform, to expand research on the aspect of heterogeneous data integration and massive data analysis. The main work is as follows:1) Heterogeneous data integration. Add the ontology into the process of heterogeneous data integration, using local ontology and globle ontology to produce the mapping rules, and complete the heterogeneous data extraction, transformation and load the data into Hive data warehouse in Hadoop platform under the guidance of the mapping rules. In the local ontology building process, by converting the rules into code to complete the automatic construction of local ontology. In the ontology mapping process, by using the comprehensive method of calculation ontology similarity to to improve the accuracy of mapping.2) Data mining and analysis. To combine the classical algorithms such as clustering, classification and association rules with MapReduce parallel programming model in Hadoop platform, design and implement the data mining parallel algorithm, to mining and analysis the data with the help of the powerful computing and storage capacity of cloud platform, obtains a good experimental results.The solution of the subject can achieve the goal of the integration of heterogeneous data and data analysis in big data platform. Adding the ontology into the process of heterogeneous data integration has improved the accuracy and efficiency of heterogeneous data integration, realized the automation of data integration. And through the combination of Hive, it can process the massive data efficiently. To parallel the data mining algorithm using the MapReduce programming model in Hadoop platform, can improve the efficiency of data mining method in processing the massive data greatly, which has good practical value.
Keywords/Search Tags:Hadoop Platform, Ontology Mapping, Heterogeneous Data Integration, MapReduce, Data Mining
PDF Full Text Request
Related items