Font Size: a A A

Criminal Case Data Completion Method Based On Improved Random Forest

Posted on:2022-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhangFull Text:PDF
GTID:2506306752965569Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The completeness of the data set is one of the important indicators to measure the pros and cons of the data set.However,the data will inevitably be lost in the process of collection,transmission,analysis,storage and other links.According to statistics,the missing rate of criminal data sets of criminal cases in some regions is as high as 32%,which seriously reduces the accuracy of criminal crime analysis.In order to improve the problem of missing data in the crime dataset,this paper uses random forest(RF)and its feature analysis,LightGBM and other machine learning methods to build a data completion model,and uses the Chicago crime dataset to verify the model.The specific innovation points and work are as follows:Firstly,this paper proposes a missing data completion algorithm(RF-KNN)based on KNN and RF.The algorithm first uses the KNN model to select the appropriate K value to determine the parameters for building a random forest.Then we build an attribute classification prediction model according to the characteristics of random forest attribute division,and effectively complete the missing attribute values.The experimental results show that RF-KNN can not only effectively reduce the size of the dataset,but also reduce the computational complexity of model training.The classification accuracy is improved by about 4.8% compared with the original RF model.Secondly,this paper uses RF-KNN model to optimize the classification of the original crime data set.On this basis,we propose a data completion model fused with improved LightGBM and DNN.The innovation of the model lies in the use of PCA dimensionality reduction and feature importance analysis to analyze the associated attributes of the Chicago crime dataset.We use DNN network for embedding learning of category features to obtain the vectorized representation of the category features,and replace the original features to train the subsequent tree model.In the LightGBM model,the LR algorithm is used to replace the final weighted average value of the tree structure for final classification prediction.Finally,to further verify the effectiveness and generalization of the hybrid model proposed in the paper,the DNN-LightGBM-LR model is compared with more models on the Chicago crime data.We selected evaluation indicators such as confusion matrix,ROC curve and logarithmic loss function logloss to evaluate the pros and cons of the model.The experimental results show that the improved data completion model is more realistic and effective for the prediction of missing data.
Keywords/Search Tags:Data completion, Crime data, Random forest, Feature analysis
PDF Full Text Request
Related items