Font Size: a A A

The Design And Implementation Of Stream-processing-based Log Analysis System

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2428330590968455Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet recent years,makes big data more and more easily produced.On the internet company,the distributed system becomes more and more complicated,because services are usually deployed in large-scale distributed clusters.These applications and services are composed of lots of individual software modules,which may be developed with different programming language by several teams,even be deployed of hundreds or thousands of distributed servers across multiple data center.Once there is a issue in which one or more services in the production environment,degree of difficulty of issue tracker can be imagined,need to inter-departmental communication and coordination,even to troubleshoot in all of software modules of services.Problems in the production environment should be able to locate and solve problems in a short time.In order to resolve the pain points of the company,it needs tools which be able to understand and analyze system behavior and performance,major domestic and foreign companies are doing APM(Application Performance Monitor)related systems,such as foreign New Relic,Splunk,domestic Taobao Hawkeye,the public comment of CAT and so on.This thesis is based on the above issues to implement a similar APM system.This thesis uses the near-real-time Spark Streaming to analysis and compute logs by different dimensions,and the results write to Hbase via OpenTSDB,that is a scalable time-series database,ultimately show these data with the form of various figures and report.The developer and operator can view their concerned data,do troubleshooting quickly,improve work efficiency,so that pay more attention on the business.Besides Managers also refer to these when decision-making.This thesis has mainly completed the following researching tasks:1.Architecture and design of log-analyzed application system.This thesis analyzes requirements of log-analyzed application system,divide it to three layers:streaming-processing layer,data-saving layer and service API layer,and make choice and analyze and design to the technical solutions involved,including stream-processing framework,the time-series-storage middleware,convenient and effective API and data cache and so on;It can be rapid and efficient process a huge batch data,more concerned about system availability,timeliness and scalability.2.Data cleaning for types of restful and gateway urls.In this thesis logs of some omains contain a large number of urls,which includes useless,users don't care bout and malicious access info.It's essential to do classify and filter on these logs,if aved,there is something trouble for the query efficiency,even give impact on esigning.3.The implementation and validation of stream-analysising-based log analysis system.This thesis implements data stream analysis & calculation,the result storage and the inal data show.The stream-analysising-based log analysis system has been running for more than half a year in the internet company.The system is validated in the thesis and it is of fluency in high-volume data flow from the calculation to show,accuracy of data,most importantly it stands the test at the peak of the big promotion.The system displays real-time data of each running application within 1 minute,which helps technicians quickly and timely to do troubleshoot and solve of the problem of the domain,thus,these provides strong support for the company's normal business operation.
Keywords/Search Tags:Massive Logs, Application Performance Monitor, Spark Streaming, OpenTSDB
PDF Full Text Request
Related items