Font Size: a A A

Distributed Multiple Data Sources E-commerce Data Fusion And Analysis System

Posted on:2017-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2348330518496596Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and logistics industry,as well as the popularity of mobile intelligent terminals,the e-commerce is playing an increasingly important role in people daily lives and national economy.E-commerce platform as a shopping carrier,carrying a large number of valuable data,from the E-commerce data can not only restore the user's online shopping environment,analysis of the impact of online shopping environment on user behavior,and can analyze the behavior of the commodity market rules for businesses to give advice,but also to analyze the national economic situation,which have high research value.This paper aim at the e-commerce data mining has characteristics of data collection and pretreatment difficult,diverse data types and other characteristics,do some specific data mining work.Aiming at the problems of single e-commerce data's low reliability and integrity,this paper design the data fusion scheme of multi data source,using the optimized matching scheme of use the e-commerce's variety parameter information to obtain fusion data,and it improved credibility and integrity of data,compared to a single data source data,data fusion can improve forecast accuracy.At the same time for the single data mining system unable to cope with the need of the massive e-commerce data mining and traditional hierarchical clustering algorithm in the Hadoop implementation has low efficiency,this paper designed based on Hadoop e-commerce data analysis of mining system,and improve and implement hierarchical clustering in the Hadoop,through the experiment verify the improved hierarchical clustering algorithm on Hadoop compared with the traditional implementation methods are greatly improved in efficiency.And without the data of commodity parameters,according to the electricity users access log data indirectly calculating the similarity between the brand of e-commerce,and use improved hierarchical clustering algorithm to cluster the brands,the clustering results are used to predict user's behavior,the result show the method to calculate the similarity between the brands without brand's parameter information and the cluster result is usable.The main point of this paper is divided into the following three points:(1).Do the data analysis and mining work of e-commerce data.In third chapter deal the e-commerce data from data definitions,data collection,data preprocessing,data analysis,data mining and results show these several aspects.To solve the problem of e-commerce data's un-structured,lack of standardization,noise data,starting from data definition,data acquisition and data preprocessing to develop solutions.While using of a variety of data mining methods to mine the data and get useful conclusions.(2).Fuse multi e-commerce source data and analyze the fusion data.In fourth chapter,using the data in chapter third,extract the goods parameters as commodity name and commodity attribute name and commodity attribute,design a method to use these information to fuse the e-commerce data.The method is an unsupervised learning algorithm which use seed characteristics to learned and find the matched data,then use the matched data to find more matched data.Compare to single source data,the fusion data is more complete and accurate.Then use the fusion data to predict commodity parameters and improve the predict precision when compare to use single source data.(3).Design distributed e-commerce data mining system based on Hadoop,and realize the improvement and optimization of hierarchical clustering based on Hadoop.As traditional agglomerative hierarchical clustering has a higher number of iterations which makes low efficiency of parallel realization on Hadoop,we propose an improved hierarchical clustering method to solve this problem,by changing the clustering order of hierarchical clustering without changing the final clustering result,aggregate multiple classes in a MapReduce operation,to reduce the number of iterations then enhance the computational efficiency.The experiments show compared to traditional hierarchical clustering algorithm implemented in Hadoop,the improved algorithm implemented in Hadoop has greatly reduces the number of iterations and the computation time.
Keywords/Search Tags:e-commerce data mining, data fusion, Hadoop, hierarchical clustering
PDF Full Text Request
Related items