Font Size: a A A

The Shared Data Service Statistical Information Extraction And Visualization

Posted on:2013-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X S DingFull Text:PDF
GTID:2248330371982604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Meridian Science Data Center is designed to provide standard space science data for thenational scientific research workers, offer services yet, aim to enhance the core competitiveness ofChina in the field of space science. With the continuous improvement and development of variousinfrastructure, more and more researchers begin to the study of space science. User’s amount of forspace data center shows a dramatic increasing, with high increasingly requirements for the datacenter. Analyze and statistic with the user’s access behavior, can found the user’s real interest, inorder to achieve the purpose to provide differentiated services.Two years uninterrupted service making data center accumulated a large amount of data,including user information, data usage information, system status, and many other business data.How to take full advantage of these seemingly messy data, is the topic that data center managersvery concerned about. Web mining can make full use of these data, dig out the hidden principleswhich behind these raw data. Using a large number of charts to reflect these principles, data centermanagers can grasp the operational status of the center, and developed according to the results ofthe analysis long-term development goals, thereby enhancing the quality of service and thedevelopment of the data center.In order to provide better services for users, analysis and statistics on the user behaviorbecome more important. We are eager to know which data the is user most needed, which kind ofperson are more likely to use our system.Based on in-depth analysis of project needs andimplementation benefit, this paper presents the MDRS system which stands for meridian datacenter reporting system. This is a information extraction and visualization reporting system basedon the Hadoop ecosystem. Functions include data collection, data cleaning, data analysis, datarecommended and data visualization.This paper first introduces the MDRS background, explained why MDRS’s implementationneed to be built based on the Hadoop platform. And then describes the format and meaning of thebusiness log, what valuable knowledge we can get through analysis the log. And then describesthe MDRS function design and implementation,highlights the following four functions.First part is how chukwa deployment and installation in data acquisition module; Secondly iscollaborative filtering algorithm and its mapreduce implementation in data analysis module; Thethird part is different display programs designed for a different use of the crowd in datavisualization module and last part is MDRS performance optimization initiatives.Finally, the previous work is summarized as well as possible on the system upgrade.
Keywords/Search Tags:Visualization, Hadoop, Log mining, Mahout Recommend
PDF Full Text Request
Related items