Font Size: a A A

Research Of Railway Fault Accident Text Big Data Mining Key Technologies And Application

Posted on:2019-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L B YangFull Text:PDF
GTID:1361330545965369Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Safety is the eternal theme of railway transportation.Dozens of security systems have been built around the safety of China railway,which has established for the safety monitoring/controlling system of train,locomotive,works,electricle and vehicles etc.These systems produce massive monitoring/controlling data,which is PB’s level and is mostly unstructured data such as voice,text,video,and graphic images.Because apart from text the other forms of information can be realized through text description,the text is the main carrier of masssive information.The most amount,most valuable and longest period text file in railway traffic safety is the unstructured accident fault text of railway.The unstructured accident fault text of railway mainly includes accident fault tracking reports,accident database,fault database,which is mostly storaged in the form of Word,Excel in the form of paper.For the traditional technical barriers,these accident fault text is unable to realize the effective storage and text analysis,which is unable to mine in the value of the text data.To achieve massive railway accident fault text analysis and promote rail safety data application,this paper mainly achieves the following innovative results:(1)The overall architecture of railway accident text big data application.To tackle the of railway accident of unstructured text data is difficult to analyze and get application,this paper proposes the pattern of "platform + application",which mainly uses the unity of railway big data service platform as the basis and combines text big data analysis technology,to realize massive railway unstructured accident fault text data analysis,and gives the overall architecture,technical architecture,function structure and key technologies.(2)The distributed storage and full-text retrieval of railway accident fault text based on ES.To tackle the difficulty to store and retrieve massive railway fault accident text data,this paper proposes a distributed storage and retrieval scheme based on Lucene’s distributed full-text search engine,Elasticsearch(ES).ES is applied to implement mass railway accident fault text data distributed storage.With the fusion of railway domain dictionary,this paper uses Jieba to realize the Chinese word segmentation.Inverted index technology is applied to implement quick index after Chinese word segmentation.Finally this paper realizes full-text retrieval based on TF-IDF algorithm.Through the experiment analysis of China Railway Corporation’s accident fault tracking reports from July to December 2016,which shows that ES based full-text retrieval efficiency will not decline with the increase of the number of text and can achieve full-text retrieval results ranked according to the similarity between query condition and the railway fault accident text.(3)Extraction of railway accident fault text features based on Bi-LSTM+CRF.To tackle the the problem of difficulty to extract key information such as name,time,location,reasons and management measures from the railway accident fault in the text,this paper proposes a feature extraction model based on Bi-LSTM + CRF.The model uses BIO to mark accident fault text,and converts the sequence tagged text to vector by Word2 Vec,and then with the application of deep learning method of Bi-LSTM to learn the features of sequence tagged text vector automaticly,and then learn the global features by CRF,to improve the effect of railway accident fault text feature extraction.Finally based on the analysis environment,TensorFlow1.2 + Python3.6,this paper takes one railway bureau’s electricle professional accident fault tracking report from July 2016 to July 2017 to do experiment,the text feature extraction results show that the proposed model of this paper achieves accuracy and recall rate and F-Score above 80% on average.(4)Intelligent classification of railway accidents based on unbalanced text data mining.To tackle the unbalance accident fault text data caused by different railway equipment mechanism and natural conditions.This paper proposes an intelligent accident fault classification method based on text mining for the imbalanced fault text data of railway equipment.The model firstly uses SVM-SMOTE algorithm to generate TF-IDF converted disequilibrium of text vector data randomly,then uses some basic classifiers(Logistic Regression,Naive Bayesian,SVM and etc.)and some integrated classifiers(Random Forests and GBDT)to classify the balanced data,and finally brings up a multiple classifier ensemble learning,considering characters of different classifiers.By analyzing the data of railway signal equipment failure text data from 2012 to 2016,this paper shows that the model can improve the accuracy,recall rate and the F-score of fault classification.(5)The correlation analysis and intelligent reason recommendation of railway accident fault based on knowledge graph.To tackle the traditional major barriers of accident fault analysis and unable to realize the pre-accident fault prevention,in view of internet knowledge graph and intelligent recommendation method,this paper put forwards a railway accident fault correlation analysis and intelligent renson recommendation model based on the knowledge graph.The model uses different accident fault equipment’s structure and the relationship between railway accident fault entity and reason entity within one major and cross majors to build accident fault knowledge graph,which is to be the base of accident fault correlation analysis and the intelligent renson recommendation.By using the improved similarity calculation ItemCF IUF and UserCF IIF model of collaborative filtering algorithms,this paper realizes the accident fault corralation analysis and intelligent reason recommendation,at the same time,the analysis results are used to build the knowledge graph as feedback,providing new knowledge added to knowledge graph.Finally,this paper uses the extracted features from railway accident fault tracking report data from July 2016 to July 2017 of one railway bureau for the test analysis.Through the coverage and novelty two indicators,the result verifies that the improved similarity calculation method is the validity,and find out the best K=20 of the similarity calculation.In the end,this paper uses one railway bureau’s accident fault text data as an example,through the PMML encapsulation railway accident fault algorithm model of text analysis,using Java SSH architecture and Restful API interface,building railway administrations failure text data analysis application platform,has realized the railway accident failure full-text retrieval,accident fault feature extraction,railway accident text segmentation,accident fault key areas,key accident fault analysis,accident fault reasons recommendation,accident fault correlation analysis etc.Through the actual engineering application,this paper proves that the research results can provide practical guidance for the field workers.
Keywords/Search Tags:Fault and accident, Text big data, Full text search, Deep learning, Knowledge graph, Unbalanced data, Correlation analysis
PDF Full Text Request
Related items