| With the frequent occurrence of global terrorist attacks,every country invests a large amount of human,material and financial resources each year in the formulation and implementation of anti-terrorism measures every year.However,the effectiveness of anti-terrorism measures depends on the effective analysis and prediction of terrorist attacks,and traditional research methods based on humanities and social sciences are often difficult to find the complicated relationship between attacks.In recent years,the quantitative analysis of terrorist attacks worldwide by using data-driven method can effectively dig out the inherent connections between attacks and it’s all characteristics,so as to play a better guiding role in the development of global anti-terrorism.In this paper,we use data mining technology to conduct in-depth research on the Global Terrorism Database(GTD),and have obtained a series of research results.(1)Various methods to preprocess the Global Terrorism Database are proposed.The GTD has many missing values and noisy data,and if it is not preprocessed,it will have a great impact on the prediction accuracy and generalization ability of the model.In terms of this issue,we first analyze the distribution and the missing situation of features,and remove features that are seriously lacking.Secondly,according to the interconnections between features,we use various ways such as web crawler to fill in missing values of features,and further generate fine-grained features.Finally,we use the covariance matrix to select the features with strong correlation,and remove the features with large information entropy.The experimental results show that the data cleaning operation and feature extraction method proposed in this paper can effectively fill the missing values,reduce the noise of the data set,and improve the utilization efficiency of the data.(2)Terrorist attacks division algorithm based on collaborative feature clustering is proposed,and a hybrid similarity index is proposed to evaluate the effectiveness of clustering.Most of the traditional terrorist attack division methods only analyze some countries or regions,and the division results are poorly interpretable.In terms of this issue,we first make the combination weighting based on the game theory to the features,so as to obtain the relative importance between the features.Secondly,we use the k-prototypes algorithm to cluster the weighted features,and use the hybrid similarity index to evaluate the clustering result.Finally,we use the ranking method approaching to the ideal point values to obtain the hazard value of each attack event,and use the average hazard value of all events in each cluster to resort the cluster labels.The experimental results prove that compared with other existing algorithms,the proposed algorithm can better divide the terrorist attack events and has better interpretability for the cluster result.(3)An attack maker prediction algorithm based on the gradient boosting decision tree is proposed.Most of the traditional prediction methods only analyze some terrorist attacks or a few terrorist organization.In terms of this issue,we first use bag-of-words model to extract text features from the summary information of global terrorist attacks in the past 21 years,and combine the text features with the weighted features.Secondly,we use the gradient boosting decision tree algorithm to predict the name of the maker of the attack.Finally,using the Sankey diagram and complex network,we analyze the correlation between the attack organization and the characteristics such as attack method,and analyze the correlation within the terrorist attacks.The experimental results prove that compared with other existing algorithms,the algorithm in this paper can more accurately predict the maker of the attack event,and can mine the hidden associations between the attacks through visual analysis,thereby playing a guiding role in the formulation of international anti-terrorism measures. |