Criminal Case Data Completion Method Based On Improved Random Forest

Posted on:2022-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Zhang

Full Text:PDF

GTID:2506306752965569

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The completeness of the data set is one of the important indicators to measure the pros and cons of the data set.However,the data will inevitably be lost in the process of collection,transmission,analysis,storage and other links.According to statistics,the missing rate of criminal data sets of criminal cases in some regions is as high as 32%,which seriously reduces the accuracy of criminal crime analysis.In order to improve the problem of missing data in the crime dataset,this paper uses random forest（RF）and its feature analysis,LightGBM and other machine learning methods to build a data completion model,and uses the Chicago crime dataset to verify the model.The specific innovation points and work are as follows:Firstly,this paper proposes a missing data completion algorithm（RF-KNN）based on KNN and RF.The algorithm first uses the KNN model to select the appropriate K value to determine the parameters for building a random forest.Then we build an attribute classification prediction model according to the characteristics of random forest attribute division,and effectively complete the missing attribute values.The experimental results show that RF-KNN can not only effectively reduce the size of the dataset,but also reduce the computational complexity of model training.The classification accuracy is improved by about 4.8% compared with the original RF model.Secondly,this paper uses RF-KNN model to optimize the classification of the original crime data set.On this basis,we propose a data completion model fused with improved LightGBM and DNN.The innovation of the model lies in the use of PCA dimensionality reduction and feature importance analysis to analyze the associated attributes of the Chicago crime dataset.We use DNN network for embedding learning of category features to obtain the vectorized representation of the category features,and replace the original features to train the subsequent tree model.In the LightGBM model,the LR algorithm is used to replace the final weighted average value of the tree structure for final classification prediction.Finally,to further verify the effectiveness and generalization of the hybrid model proposed in the paper,the DNN-LightGBM-LR model is compared with more models on the Chicago crime data.We selected evaluation indicators such as confusion matrix,ROC curve and logarithmic loss function logloss to evaluate the pros and cons of the model.The experimental results show that the improved data completion model is more realistic and effective for the prediction of missing data.

Keywords/Search Tags:

Data completion, Crime data, Random forest, Feature analysis

PDF Full Text Request

Related items

1	Research On DNS Covert Channels Detection By Static Data Analysis In County Government Networks With The Method Of Random Forest
2	Research And Implementation Of Community Correction Staff Risk Assessment Technology
3	Research On Background Modeling Algorithm Based On Fusion Color Data And Depth Data
4	Review And Perfection Of Criminal Law Protection System Of Data Security In The Era Of Big Data
5	Analysis And Application Of Crime Hot Spot Data Based On Temporal And Spatial Features
6	Research And Application Of Data Governance System Based On L New Area Government Big Data Platform
7	Research On The Determination Of The Crime Of Illegally Obtaining Computer Information System Data
8	Legal Issues Of Data Transaction
9	Interpretative Data Analysis And Its Application In Crime Pattern Mining And Forecasting
10	Research On Auxiliary Judgment Of Court Based On Data Mining Of Judgment Books