| In recent years,the country has carried out law popularization operations many times in order to let the people know the law and understand the law,in order to reduce the crime rate.It will be a long and difficult process to rely on human resources to popularize the law.At present,many people involved in the case understand that the case is to seek help from professional legal personnel.These legal popularization processes are a repetitive and simple job for professionals.So,using artificial intelligence to assist it has become a general trend.Due to the limitations of technology and equipment,artificial intelligence cannot completely replace lawyers and judges and so most artificial intelligence systems in the legal field are supplementary.In order to achieve the purpose of popularizing the law,reducing the workload of professional legal personnel and assisting in handling cases,this thesis develops a criminal behaviour classification system.It first pre-processes various Chinese criminal behaviour texts and uses multiple single learning models to predict the crime category of certain criminal behaviour.Then it incorporates keywords to distinguish confusing classification of criminal behaviour.Finally,it integrates multiple learning models to adjust the weights to obtain the final prediction results.The main contributions of this thesis are as follows:1.In the traditional process of text feature value selection,a feature filtering method based on word embedding is used to solve the problem of large word vector dimension and sparse vector matrix.The traditional approach uses the whole words of the train-ing set as features to construct a word vector space,and uses the word vector space generated by the training set to convert text information into digital vectors during testing.In this thesis,before using the training set to construct the word vector space,we use the word embedding method to obtain the filtered vocabulary,and the new vocabulary to construct the word vector space.During training and testing,we use the word vector space to convert the original data into digital vectors,and use TF-IDF to obtain the weight matrix in the calculation of the classification model.After pro-cessing,the dimension of the weight matrix is reduced by one third,and the problem of vector sparseness is eased.2.Integrate multiple classification models to improve the classification accuracy.We assign a weight to each model and accordingly fuse them to get the final result.different emphasis,assign weights to the models,adjust The weight of a single model reflects the proportion of the results of the model in the overall results.We experiment to find the optimal solution weight distribution.The fused model is better than any single model in terms of classification accuracy.3.We sse Text Rank to obtain the keywords of the charges to remove the confusion in criminal classification.It is not easy to distinguish some confusing crimes by merely using the classification model.So we add crime keywords to distinguish and verify confusing crimes.Specifically,we use Text Rank to obtain a keyword list for each crime,compare the keyword list of the confusion team,and use words that are not common to the two as the keyword list to distinguish confusion.Moreover,we study the criminal law’s qualitative words that are easy to cause confusion in classifying crimes and verify and modify the keyword list obtained.We find that incorporating keywords and rules can effectively remove confusing criminal behaviour classification.In summary,the pre-processing and post-processing of data can improve the accuracy of criminal behaviour classification.The idea of these processes may also be useful in Chinese text classification in other fields to improve their classification accuracy. |