Font Size: a A A

Application Search Multi Source Data Mining Technology Based On Hadoop

Posted on:2016-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:H A ZhangFull Text:PDF
GTID:2298330467993310Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology, makes the importance of data more and more obvious. Whether research institutions, or enterprises and institutions to the extent of the data are raised to a new level. How to extract the information from a large amount of data into the research or enterprise valuable information become the focus of the current research data analysis and mining.The continuous improvement of data processing technology, makes all walks of life and all industries from data analysis and mining experience in the value of the data. But with the rapid growth of data quantity, the diversified forms of data, data mining technology is difficult. Especially in the integration of decentralized business database, the build of data warehouse will consume more time; in the process of analyzing data, its performance and efficiency are often relatively poor. How to construct a data mining platform based on the cheap machine architecture for business dispersion, data mining platform has become the hot research institute and company research.The application of Hadoop technology, in the face of a large number of business data storage and computing capacity, it has been recognized by the industry. Hadoop is a distributed data processing framework, in response to the distributed database, showing natural advantages, the ecological system of data mining products can be seamlessly connected, largely solves the compatibility between. The advantages of Hadoop technique based on, build a suitable business environment, the dispersion of large amount of data mining platform has become a possibility.In this paper, first of all compares the related data mining technology and based on the advantages and disadvantages of Hadoop technology, and then combined with the business needs and the actual situation, the final choice of multiple data sources based on Hadoop data mining technology as the research direction of this topic. And then design a data mining platform based on Hadoop, is divided into four stages, three function library.The data processing system to complete the data loading, data warehouse building, data integration in this process mainly using Sqoop technology for multiple data sources, using Hive technology to build the data warehouse and the HQL statement to achieve the related data query and retrieval; core algorithm library is for data mining and storage of mining the results, at this stage is mainly based on Mahout data mining algorithm and rewrite the MapReduce programming model based on data mining algorithm to realize the algorithm library, also through the HBase to store the data mining results, for the front page display; design rule library is closed to complete the data mining work, the mining result visualization display to the user.Finally, through a combination of business characteristics to verify the mining results, which reflects the value of mining work. This process uses the Highcharts chart form to display the results of mining, the mining results can be visual, easy to be accepted by users. By the design, build a platform for multi source data mining based on Hadoop, try to solve the problem of multiple data sources of data mining.
Keywords/Search Tags:Data Warehouse, Data Mining, Clustering Mining, Associative Rules
PDF Full Text Request
Related items