Font Size: a A A

Research And Application Of Data Mining And Visualization Based On Hadoop

Posted on:2019-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2428330566991410Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the amount of global data continues to expand,and the huge amount of data has greatly exceeded people's processing capabilities,which is not conducive to the viewers to obtain core information quickly and efficiently.Therefore,data analysis and mining and visualization of results have become an urgent need..Hadoop,as the most widely used platform for large-scale distributed data processing,combines MapReduce's parallel computing technology with data mining algorithms to improve the execution speed of the algorithm and enable rapid exploration of unknown information in large-scale data.The data processing results and its visual representation provide enterprises with the opportunity to gain real-time business insights and provide good data for decision makers.Based on the analysis of the key issues of the business logic of the e-commerce system,this paper designs the big data processing system based on the background of the beauty industry e-commerce system in Shaanxi Chuangmei Group.The problems of electronic data acquisition,data preprocessing,data mining analysis and result visualization are systematically studied to achieve a complete set of big data processing systems.In this paper,the traditional K-means clustering algorithm and association rules Apriori algorithm are combined with Hadoop cloud platform to achieve the parallelization of the algorithm.The main contents of the study are:(1)Combining business logic with MapReduce parallel processing technology to achieve statistics of sales performance and statistics of the most popular projects and products,using calendar maps and word cloud diagrams for intuitive display.(2)By studying the Mahout algorithm library of Hadoop sub-items and running the K-means clustering algorithm based on the Mahout API,a cluster analysis of the customer's consumption situation and the beautician's service status is performed,and the customer and the beautician are subdivided into levels,and bubbles are used.The graph shows the grading.(3)According to the customer's project purchase records,the Apriori association algorithm was transplanted to the Hadoop platform,and the potential contact between the customer purchase items was tapped,and the association diagram was used to show the connection between the customers purchasing the items.The data mining algorithm is transplanted to the Hadoop platform,and the large-scale data is processed quickly and efficiently,and the potential information in the data set is uncovered.The visualization plug-in Echarts realizes the intuitive display of the analysis results and realizes a complete set of big data processing system.The data mining and processing of the intern company has obtained beneficial results and has provided powerful help to the company's operational decision-making.
Keywords/Search Tags:Hadoop, data mining, k-means, Apriori, visualization techniques
PDF Full Text Request
Related items