Font Size: a A A

I/O Behavior Analysis Tool Research For High Performance Computing Systems

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhangFull Text:PDF
GTID:2428330572983930Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of high-performance computing has greatly improved the computing power of supercomputers.However,the corresponding development of supercomputer I/O performance is relatively slow.At the same time,the I/O subsystem in the supercomputer has a long access path and high competition for the application,which makes the overall resource utilization difficult to improve and the application experience is not good.For many current scientific computing applications,I/O performance rather than computing power becomes a performance bottleneck.Therefore,analyzing the I/O behavior of large-scale applications on a complex architecture such as a supercomputer,and timely detecting I/O performance abnormalities in the system becomes the key to optimizing large-scale application I/O performance and improving system resource utilization..Firstly,aiming at the I/O subsystem of Sunway TaihuLight,this paper designs and implements a set of I/0 behavior analysis tools for large-scale computing systems.It mainly includes three parts:application-oriented I/O mode analysis tools.Application-oriented front-end I/O performance data separation tool,and automated I/O performance anomaly detection tool.Application-oriented I/O pattern analysis tools can understand the application's I/O mode and performance and trigger corresponding optimizations;view the performance of different parts of the optical storage architecture of the TaihuLight through the application-oriented front-end performance data separation tool.Evaluate system competition and utilization;use automated performance anomaly detection tools to detect system performance anomalies in real time and better manage system resources.Secondly,this paper presents and analyzes the I/O performance data on the light of Shenwei Taihu Lake.Specifically,this paper analyzes the access data volume and I/O mode of the job,finds the inefficient I/O mode,and proposes optimization suggestions for application developers and administrators according to the inefficient I/O mode.Analyzed the results of performance anomaly detection,found the performance anomalies that have occurred in the light of Taihu Lake and gave the root cause diagnosis,such as the impact of high-interference applications,the impact of performance abnormal nodes,etc.Resource conflicts,rational allocation,and planning system resources have greatly helped.
Keywords/Search Tags:I/O behavior analysis, I/O performance isolation, performance optimization, anomaly detection
PDF Full Text Request
Related items