| With the rapid development of Internet technology,most applications are built in a large,complicated and distributed cluster across protocol layers.The technology,software and configuration of this kind of distributed cluster are always evolving,and it is difficult to avoid failure.Faced with massive monitoring data and huge systems,it is difficult for IT(Information Technology)operation and maintenance personnel to make quick and accurate operation and maintenance decisions to deal with various failures.In recent years,Artificial Intelligence for IT Operations(AIOps)has improved the efficiency of IT operation and maintenance by introducing artificial intelligence technology.However,in the actual scene,IT operation and maintenance still faces three problems: it is difficult to integrate multi-source heterogeneous data,insufficient knowledge of operation and maintenance and it is difficult to accurately predict faults.In the aspect of integration of multiple heterogeneous data,the existing operational status monitoring models can only integrate one-sided data,which leads to one-sided operation and maintenance knowledge.In the aspect of operation and maintenance knowledge representation,the traditional representation method is limited to the display structure of knowledge,ignoring the deep meaning of operation and maintenance knowledge.In the aspect of fault prediction,the existing fault prediction methods do not introduce operation and maintenance knowledge,and the prediction results are lack of interpretability and low reliability.Based on this,this paper puts forward an IT operation and maintenance assistant technology to effectively solve the above problems,which mainly includes:(1)An automatic method of building component-event knowledge graph is proposed,which integrates all data spanning hardware,software,logs and operational indicators,and uses machine learning model to generate component-event knowledge graph,thus reducing the labor consumption of building knowledge graph and solving the problem that multi-source heterogeneous data is difficult to integrate.(2)A representation learning model of component-event knowledge graph is proposed.Considering the different meanings of entities in different contexts,entity representation is divided into semantic representation and structural representation,which realizes the dynamic representation of entities changing with context.It achieves the best results in the task of component-event knowledge graph triple classification and link prediction,and solves the problem of insufficient operation and maintenance knowledge representation.(3)A fault prediction model based on component-event knowledge graph is proposed.The key information in event sequence is identified by knowledge graph,and the best matching fault type is predicted.The fine granularity of prediction result is improved,the interpretability is enhanced,and the problem that the fault is difficult to predict accurately is solved.(4)Based on the above work,an IT operation and maintenance assistant system based on knowledge graph is designed and implemented.To sum up,this paper uses historical data to build knowledge graph automatically,puts forward a representation learning model for this kind of knowledge graph,introduces knowledge graph into fault prediction,and finally designs and implements an IT operation and maintenance assistant system based on knowledge graph. |