Font Size: a A A

Study On Data Mining Platform Based On Cloud Computing

Posted on:2016-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y H FangFull Text:PDF
GTID:2298330467497339Subject:Computer support collaborative work technology
Abstract/Summary:PDF Full Text Request
Nowadays, due to the rapid development of information technology and the Internet,billions of users are using the Internet everyday, and the users’ various operations andbehaviors will generate a lot of information and data. According to statistics, the annualgrowth of the global amount of data is calculated by ZB (1ZB=10^21byte), and nearly90%of the data were only produced in recent years. By the year of2020, the amount of data of theglobe will reach40ZB.People have left from the "lack of information" era, and enter to the "informationoverload" era because of the growth of data.People want to be able to store the data, and by analyzing of these data, to extract thehidden patterns and values. This brings the new challenges and problems:1) big data storageproblems;2) big data processing problems;3) data mining methods.Companies like Google, Baidu, Alibaba and many other Internet companies who have toface large amounts of data every day, are building the cloud platform. They build thedistributed computing system running in the backstage to store and process massive amountsof data, and using data mining techniques and methods to analyze the data, in order tooptimize their business systems. Not only to the Internet industry, many other industries alsohave great needs for big data processing and data mining technique. Such as, the hospital canuse data mining methods to analyze the patient’s cases, in order to make more precisediagnosis and treatment; The weather forecast bureau can make more accurate weatherforecasting by using the large data processing; the retail industry, use data mining technique toanalyze retail transaction records to optimize their own products, in order to draw morecustomers’attention, so that their profit rates can be raising up.Based on the above background, this paper combine cloud computing and data miningtechnology to design and build a data mining platform based on cloud computing. Thesignificant meaning of this platform is based on the cloud computing model, providing theability to big data mining and big data storage from the cloud platform, users only need toaccess the service application interface provided by the cloud platform, which enables users topay more attention to their own business logic, and reduce the costs in other sectors systemabout data mining. the platform system includes three layers:1) base support layer, the paper dueconsideration to the distributed systems and the ability to handle massive data storage, anduse of the Hadoop distributed computing framework HDFS and MapReduce, buildinfrastructure to support layer, the ability to provide data storage and computing support.2)the service layer, data mining algorithms and operations on the data package, because of theinterface requirements of simple, efficient and consistent characteristics, this paper uses aREST interface design specification to design, the upper application to provide services.3)the user application layer, data mining platform as a service, provide external data mining andlarge capacity data storage, user interface application by calling the service layer, you can getthis service.Because of the recommendation system based on data mining is widely used in industry,so in this paper, choose collaborative filtering algorithm as an algorithm instance to run ondata mining platform. And proposed the introduction of the project popularity weightingfactor based the weakening of the popular project to calculate the similarity, therebyincreasing the degree of personalized recommendation system for improved collaborativefiltering algorithm in this paper using PHP implements a user application layer The prototypesystem, and MovieLens as experimental datasets, data mining in cloud computing platformfor collaborative filtering algorithm has been tested using the recommended effect precision,recall, courage, popularity, respectively traditional algorithms and improvement algorithmswere assessment, based on the experimental data obtained modified algorithm is slightlybetter than the traditional collaborative filtering algorithm conclusions.
Keywords/Search Tags:Cloud Computing, Data Mining, Collaborative Filtering Recommender, RecommenderSystem, REST, Hadoop
PDF Full Text Request
Related items