With the development of intelligent operation and maintenance of traction power supply system,the importance of mining and analyzing all kinds of data formed in the whole life cycle of traction power supply equipment has become increasingly prominent.At present,the applied research on structured data generated in the operation and maintenance of traction power supply equipment is relatively mature.However,the massive historical defect text of traction power supply equipment is unstructured data,which requires manual processing,time consuming and labor consuming,and is difficult to be fully utilized.In this context,in this paper,the traction power supply equipment defects of text mining are studied,through natural language processing and information extraction of text mining technology,such as the defects of text mining and application,so as to realize the automation and information management of this kind of text,rich source of data for all kinds of advanced application in intelligent operations.This paper has carried out the following work:Firstly,through the field investigation,the problems existing in the processing and application of the defect text data of the field traction power supply equipment are analyzed,which mainly include the difficulty in efficient processing,the difficulty in information management,and the insufficient application of the important information contained in the defect text,etc.,and the requirements of text mining and application are proposed.Based on the related technology of text mining,the text mining framework of traction power supply equipment defects is constructed,and the corresponding function planning is carried out.It provides a general idea to solve the problems of time consuming and labor consuming in text processing of traction power supply equipment defects and not fully excavated.In the mining of the defect text of traction power supply equipment,and the Chinese word segmentation of the defect text is the key step of the preprocessing.In order to achieve accurate word segmentation of traction power supply equipment defect text,a dynamically updated word segmentation dictionary of traction power supply equipment defect text domain is constructed in this paper.Secondly,the traditional Chinese word segmentation method based on the combination of dictionary and statistics is optimized,and a Chinese word segmentation method based on the dynamic text dictionary of traction power supply equipment defects and the combination of hidden Markov is studied.Finally,based on the evaluation index of Chinese word segmentation,the effect of the Chinese word segmentation method presented in this paper was compared with that of the current mainstream word segmentation methods to verify the effectiveness of this algorithm for traction power supply defect text segmentation.Finally,in order to carry out information mining on the defect text of traction power supply equipment after word segmentation.Firstly,the semantic framework and semantic elements of the defect text of traction power supply equipment are defined.Considering the difference of the defect text records and the lack of standardization of some defect text,and combining with the semantic similarity of defect text,a model of equipment defect text information mining considering the lack of core semantics is studied.Secondly,aiming at the filling of semantic framework,based on the results of Chinese word segmentation of defect text and equipment guidelines,an ontology dictionary of defect text of traction power supply equipment is constructed,and an information mining method of defect text of traction power supply equipment is designed and implemented.Finally,the statistical function of structured data based on the defect text mining of traction power supply equipment is realized through programming,which verifies the value of the text mining method in the realization of various defect statistical classification for the defect text that is difficult to directly mine and record the difference and part of the non-standard. |