Font Size: a A A

The Implementation And Optimization Of Log Analysis System Based On Hive

Posted on:2018-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2348330536479767Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The rapid popularization of mobile intelligent terminals has promoted the development of mobile Internet.The demand for mobile applications and services is increasing,too.Internet companies,e-commerce platform and traditional service industries are actively changing their focus to mobile applications in order to meet the growing demand of users.After collecting a large number of user behavior logs,they can get behavior trajectories,characteristics and preferences of users by analyzing these logs,they can provide customerized services according to different characteristics of the users and improve the users' experience,and finally expand the market share of enterprises.The emergence of Hadoop distributed system has overcome the problem that traditional technology can not complete massive data storage and processing tasks when facing to the growing TB and even the PB-level user behavior logs.Relying on the massive log analysis demand of mobile Internet companies,we research,analyze and optimize the log analysis system based on Hive by Hadoop and traditional data processing technology in this thesis.The key technologies in the massive log analysis system under the background of big data,such as Hadoop Distributed File System,distributed computing framework Map Reduce and Spark,Data Warehouse based on Hadoop and data migration tool Sqoop are introduced.The commonly-used big data platform and its advantages are discussed,too.According to the actual business requirements,we research and analyze the process of massive logs processing and system performance of the log analysis system based on Hive,and found the performance bottleneck of the system.Based on the business requirements,data characteristics and system architecture,the system is optimized in the aspects of system framework,data integration,data Storage and data processing,etc.The system optimization scheme is studied and tested,and the testing results verify the feasibility and effectiveness of the system optimization.
Keywords/Search Tags:Hive, Data Warehouse, log analysis, Big Data, Performance optimization
PDF Full Text Request
Related items