| Safety is always an essential theme of rail transportation.Rail transit usually refers to the national railway system,intercity rail transit and urban rail transit.Ensuring rail transportation safety requires staff to do routine monitoring,regular inspection,and maintenance of trains and rails.For this reason,a large amount of relevant unstructured text data has been accumulated.At present,the strategy of omission,replacement or deletion is often adopted for unstructured text data.Since unstructured text data is more abstract than structured data,it cannot be directly processed by a computer,so it is challenging to develop.With the advancement of machine learning methods,unstructured text data has been developed in natural language processing,and some methods of transforming unstructured text data into structured data have been proposed.At the same time,the requirements of rail transit informatization have given attention to the development of rail transit unstructured text data.There is an urgent need for effective methods to mine unstructured text data in reports or records,extract practical information,and further construct the cause.The model or accident model conducts risk analysis on rail transit safety.Therefore,this dissertation mainly focuses on the research of unstructured text data of rail transit and proposes a network-based method to extract keywords from the text,identify the text of the cause of the accident,and realize the cause analysis and risk analysis accident.The development needs of rail transit informatization and intelligence.The main content and innovations of this dissertation are as follows:1.Keyword extraction method based on complex network.In order to help rail transit personnel to obtain key content from a large number of accident/fault texts,considering that the existing keyword extraction methods based on network theory only consider the connection between words in the document,and ignore the impact of sentences,this dissertation proposes a new network model,NWS(New Word-Sentence),constructs a two-layer text network composed of word nets and sentence nets,which considers the influence of sentences in the text on words.The experimental results prove that the keyword accuracy,recall and F value extracted by the NWS method are better than the classic TF-IDF(Term Frequency-Inverse Document Frequency)method,the traditional MF(the Most Frequent)method and the same network-based Word-net method and Text Rank method,the accuracy,recall and F value of NWS are 7.95%,8.27%,and 6.54% higher than Word-net method respectively.The average accuracy of NWS is also 17.56% higher than the result of the TF-IDF method.2.A text recognition method for rail traffic accidents based on keyword extraction.In order to accurately identify the cause text of a rail transit accident from the rail transit text data for subsequent cause analysis and risk analysis,a cause text recognition method based on NWS keyword extraction is proposed.The method first uses the NWS method to obtain the characteristics of the cause text and combines the text preprocessing process to generate a user dictionary and stop vocabulary related to the cause text from realizing the recognition of the cause text.In order to verify the effectiveness of the method,the experimental group was formed by changing the different conditions in the method and the effect of the proposed method was compared.The experimental results show that the causal text recognition method has the best recognition effect and the highest accuracy rate,recall rate,and F value when considering keyword characteristics,stop word function,and text similarity.In terms of accuracy alone,the difference with other experiments is up to 8.72%.Experimental verification,this ar dissertation ticle,through the identification of the cause text of the Beijing subway accident report,completed the construction of the bow-tie model,and finally proposed corresponding preventive measures.3.Dynamic Bayesian network accident risk analysis oriented to text data.This dissertation proposes a Bayesian network construction method based on text data to identify the cause text from the accident text and realize the construction of the fault tree based on the text characteristics and word association.Based on the fault tree and graph model established by the text data,the Bayesian network finally obtained carries out a risk analysis of the accident.The experimental data is selected from the 2008-2018 train derailment text data set of the US Federal Railroad.The method proposed in this dissertation identifies the cause text of the accident and extracts the causal relationship from the cause text to construct the fault tree so as to obtain the corresponding shell further.The prior probability of the Yeesian network and the probability of the accident at the top of the Bayesian network are obtained,and the probability of a train derailment accident is calculated to be 2.24E-04.4.Risk analysis method of cause chain based on subway text data.This dissertation uses web crawler technology to obtain the subway operating data released by the "Beijing Subway" on the Weibo platform from 2006 to 2020,combined with the keyword extraction methods and cause text recognition methods that have been proposed in this dissertation and considers adequate information such as subway line factors.Due to the influence of the network,a cause-causing double-layer network is constructed.The network is composed of a cause network and a text sentence network.The node of the cause network is the identified cause text,and the node of the text sentence network is the sentence that records each accident.Since the two-layer network has text attributes,the total characteristic value of the causal node can be calculated and used as the dangerous value of the causal node.After the successive failure method is adopted for the cause network to obtain the most likely cause chain of the accident,the risk value of the cause node is substituted into the formula to calculate the risk value of each cause factor and the cause chain.The results show that the riskiest causal chain is C9→D4,and the risk value reaches 27.717.It indicates that the risk of accidents caused by subway equipment failure due to congestion is large and can evolve into more significant accidents,such as elevator damage accidents.Therefore,it should be timely reduced the passage flow in subway stations. |