Font Size: a A A

Research And Implementation Of Data Mining And Analysis On Online Entertainment Platforms Based On Hadoop

Posted on:2018-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:C GaoFull Text:PDF
GTID:2348330518499071Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid improvement of network facilities,the industry of live broadcasting of entertainment online,which has the characteristics of reality and high involvement of audiences,has become more and more popular.Under the influence of the market economy,online entertainment platforms have gradually become an important place for ad-posting.Advertising depends on the anchor's live broadcast.How to recommend proper advertisement to these anchors has become a topic of common interest for the anchors and advertisers.At present,it is a good choice to tag the anchors and then we implement the advertising recommendation based on the anchors' tags.In order to avoid the situation of low efficiency for single machine mining,Hadoop platform is adopted to mine the anchors' data of online entertainment platforms in a parallel way.We drew a conclusion for the specific difference of data about the fans and audience interaction among the different kinds of anchors.The conclusion can provide scientific data for the generation of anchors' personalized tags.This will facilitate the follow-up advertising recommendation work.The research of this paper is a part of an enterprise project called an advertising recommendation system for anchors who have personalized tags.It has a practical application background in the enterprise.Through the research and improvement of the classical Apriori algorithm in the field of association rules mining,we have realized the work of data mining on the Hadoop platform in this paper.The following is the main contents of this paper:(1)We have studied and analyzed the architecture of Hadoop,and we focused on the analysis of some features and principles of two core parts: HDFS(distributed file system)and MapReduce(distributed programming model).Then we studied the principle of crawler technology and related knowledge of data mining.According to the business needs of data mining and data characteristics,we decided to use association rules mining for data mining and analysis in this paper.(2)We have studied and designed a scheme to get reptile data on the online entertainment platforms.Specifically,it is to design and realize the distributed crawler to get the information of the online entertainment platforms and store the information in the remote database.(3)We researched and analyzed the characteristic and the principle of Apriori algorithm.Apriori algorithm may have a lot of candidate sets,and it needs to scan the entire database to verify each candidate set.It is a shortcoming.In this paper,through the introduction of temporary tables,we increased time and space as a small cost,greatly reduced the times of scanning the entire database and also reduced the number of generated candidate sets,and put forward an improved algorithm of Apriori.The performance of improvement is more obvious when the improved algorithm is applied in the huge data,so it is quite suitable to be applied for data mining of the online entertainment platforms.(4)We designed and implemented a data mining system for the anchors' data of the online entertainment platforms.The work is based on the fact that we have obtained the data of the online entertainment platforms.Firstly we built a Hadoop platform,and then we migrated the information of anchors on the online entertainment platform to the data warehouse on the Hadoop platform.Finally,we used the improved Apriori algorithm to mine the data in the data warehouse and drew a conclusion.In the research of this paper,we used the technology of web crawler,data migration and so on,and realized effectively the data mining of the live online entertainment platforms,and then drew the conclusion: At present,the anchors of games,entertainment and outdoor are more popular.The specific data can be a scientific reference for the generation of anchors' personalized tags.
Keywords/Search Tags:Hadoop, Web crawler, Temporary table, Mining association rules
PDF Full Text Request
Related items