| Safety is the primary goal in the air traffic control system.In order to effectively prevent and reduce the occurrence of accidents and incidents,the safety management department and related units regularly carry out hazard source investigations,starting from the aspects of human,machine,environment,and management to identify the safety Influential hazards or hidden dangers shall be recorded and facilitate subsequent risk management and control.The air traffic management system has accumulated a large amount of dangerous source data after long-term investigation and recording of hazard sources.There are many kinds of hazard source data,which is not convenient for management and analysis.Therefore,it is proposed to use natural language processing technology to analyze the hazard source text.First of all,since the unstructured text of air traffic management system hazards contains non-Chinese character strings composed of characters,such as air routes and flights,and there are a large number of professional terms in the text,a combination of rules and information entropy algorithm for unregistered word recognition is proposed to construct thesaurus in the field of air traffic management.Secondly,aiming at the imbalance of the free text categories of hazard sources in the air traffic management system,which leads to the problem that the classifier overfits most samples,the SMOTE algorithm and the improved cascade model are combined to improve the classification accuracy of hazard texts.First,perform word segmentation and stop word processing on the dangerous source text set,and use the TF-IDF algorithm to extract the characteristics of the dangerous source text to vectorize it,and use the SMOTE algorithm to randomly generate the vectorized minority text to make the text set category.The distribution tends to be balanced;the cascade model is improved from both the base classifier and the weight of the output category vector to improve the classification effect of the imbalanced air traffic control hazard source text.In order to verify the applicability of the model,the air traffic management system hazard source report is used as the data source to verify the model’s classification performance of hazard source text through experiments.The results show that the Borderline-SMOTE and improved cascade model can effectively improve the classification effect of minority samples compared with traditional classification methods,thereby improving the classification accuracy of the overall ATC hazard source text.Finally,taking the "violating work standards" type of hazard source text as an example,use the TF-IDF algorithm to extract keywords from the four main information of hazard description,triggering factors,possible consequences,and mitigation measures,and sort according to the key secondary weights.Analyze the four main information,and then mine the strong association rules in the four main information keywords through the Apriori algorithm,provide countermeasures for the prevention of related hazards. |