In order to further improve the ability of early warning for the forecast of coal mine safe production,this article is based on clustering and classification algorithm.The text data record in the process of coal mining is analyzed with the use of big data analysis technology.Combined with practical experience and theory research,the text data in coal mine will be analyzed comprehensively and systematically,so as to achieve the purpose of improving the ability of coal mine safe production risk precontrol.In this paper,the text of coal mine gas accident case is used as corpus data.In view of a large amount of text data recorded in the process of coal mining,the text preprocessing operation is firstly carried out,including data cleaning,Chinese word segmentation and part-of-speech tagging,word to quantity,etc.In the part of word segmentation,the experimental results of IKAnalyzer(IK)and ICTCLAS(IC)(two very popular Chinese word segmentation algorithms)are compared and analyzed in large number of Chinese text segmentation problems.The results show that the performance of the IC is more outstanding,so the IC word segmentation algorithm is selected for word segmentation and part of speech tagging of the accident text,and the text information has been transformed into structured data.In order to overcome the shortcomings of existing clustering algorithms,Canopy_Kmeans algorithm was designed by combining K-means algorithm and Canopy clustering,and the implementation steps of Canopy_Kmeans algorithm were analyzed.With the aid of Hadoop distributed computing platform and the MapReduce programming model in cloud computing,this paper conducted a cluster analysis of the text of gas accident cases,and got 30 clustering topics as well as the corresponding accident category text.The texts of these different types of accidents were counted to choose 6 accident texts which occurrs most frequently to be the training set of the classification model.Then,the random forest algorithm is used to train the data set.In this way,the unknown text files can be classified and predicted to realize the analysis and early warning of coal mine safety.Through the data analysis of the text records of coal mine gas accident cases and the improvement of the processing efficiency of massive text data,this paper realizes the prediction and early warning of coal mine safe production to a certain extent,which has a certain application value.Figure 31 Table 3 Reference 93... |