Font Size: a A A

Design And Implementation Of Service Data Analysis Subsystem Based On Hadoop

Posted on:2015-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:D WeiFull Text:PDF
GTID:2298330467463070Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet and information society has brought a huge amount of data. Data growth on one hand brings the higher requirements of calculation and storage, on the other hand there is great value in it. So the demand of business operations is also increasing. The user behavior characteristics and product usage contained in these data is an important source of information for enterprises to grasp market trends. However, for the traditional data analysis platform, the data is stored in a relational database. The computing speed and storage capacity and scalability are greatly limited due to the single-server to perform tasks and the growing data. Therefore, this thesis designed and implemented the data analysis system based on Hadoop distributed architecture.At first, this thesis introduces the background of the subject and theoretical knowledge associated with this issue which includes Hadoop, Hive, Flume and Redis. And then according to the user requirements, this thesis makes the functional and non-functional requirements analysis with use cases diagram. Then this thesis divide the system into five modules, including data collection module, data preprocessing module, data processing module, monitor module and data representation module. Based on the requirements of each modul, this thesis makes a detailed introduction about the specific design and implementation of each module combined with sequent diagram and code. Meanwhile, in all aspects of the data flow are monitored to ensure the consistency of the data. Finally, the system is tested and shows the results, which indicates that it has achieved our goals.The data analysis system based on the distributed architecture is able to overcome the shortcomings of traditional databases, including slow calculation and small storage space. It can split the large files into pieces on the distributed file system server, dispersing the stress of server and improving the the efficiency of calculation.At the same time,this system provides good backups and error recovery mechanisms. At present the data analysis system has been used in a company.
Keywords/Search Tags:Data Analysis, Distributed, Hadoop, Hive, Flume
PDF Full Text Request
Related items