Font Size: a A A

Meticulous Analysis Based On Hadoop And Its Application

Posted on:2016-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:L L JiFull Text:PDF
GTID:2308330473965482Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, the Internet has gradually replaced traditional media, such as newspapers, periodicals, etc., to become the world’s largest media. With the development of Internet media, ads and click have gradually become the main source of income in a variety of interesting Web sites and applications. Compared to any kind of traditional media, the Internet can provide an inexhaustible "page" for people in the aspects of capacity. So website owners and advertisers are increasingly concerned on Web advertising. Different from traditional media, ads can be customized on the Web, which are not allowed to hard coal interface. Web owners can take advantage of the users information to determine which ads should be displayed to the user, regardless of which page they are browsing. This year’s advertising industry report shows that advertisers have gradually shifted from buying advertising to the purchase of advertising crowd on web site. Therefore it is not only a great opportunity but also a challenge that the advertising industry enterprises are facing to detailedly analyze user’s ads group.However TB level even PB level massive behavioral datas will be generated every day by hundreds of millions of users’ surfing internet,at the same time, a large number of advertisers have the demands for advertising. It is unable to meet the demand to process the extremely large data fastly by using the traditional single host to store, match and analyze the users data and ads.Thus, it is an inevitable direction of development to use distributed storage and computing.Hadoop is a distributed computing platform,using the HDFS distributed file system and MapReduce distributed computing framework as the core, with high reliability, high scalability, efficiency, high fault tolerance,and applicable to large-scale data collection analysis and processing. Due to the outstanding advantages, Hadoop-based applications have been springing up all over in the Internet field, such as web log analysis, search engines, data mining, and it has made a very outstanding results.Based on the above background, this paper presents the CURE algorithms based on the Hadoop distributed platforms and cluster analysis algorithms of data mining to research and implement the detailedly analysis and calculation of the users group according to the user’s browsing advertising behavior, clicking advertising behavior, downloading advertising behavior and forwarding advertising behavior four aspects. It takes full advantage of the Mapreduce advantage in dealing with massive amounts of data and stores massive data in large-scale distributed file system HDFS which is suitable for cluster computing.
Keywords/Search Tags:Detailed Analysis Crowd, Hadoop, HDFS, MapReduce, CURE Cluster Analysis Algorithm
PDF Full Text Request
Related items