With the widespread application of information technology in the judicial industry,more and more judicial data has been migrated to information systems.Under the guidance of the Supreme People’s Court of The People’s Republic of China,China has successfully established Judgements Online,Judicial Process Information Online and other judicial platforms,which achieved the informatization of judicial business to a certain extent.Judicial informatization has promoted the development of judicial wisdom.The Ministry of Justice of the People’s Republic of China put forward the instruction of ”digital rule of law and intelligent justice”.Currently,representative intelligent judicial applications include automatic recommendation of similar cases,automatic prediction of legal provisions and automatic prediction of sentencing.The foundation of these applications is deep learning model,which requires a large amount of judicial text as training data.There is currently a problem in the implementation of intelligent judicial applications,that is,the amount of judicial text is insufficient,while the quality is not high,which leads to poor generalization ability of the models and poor application effect.Therefore,obtaining sufficient quantity and high-quality judicial text has become a demand of judicial technology researchers.This thesis designs and implements a system for augmenting high-quality judicial text.Compared with the traditional data augmentation method,this system focuses on the judicial field and implements the data augmentation from the perspective of data quality.For semi-structured data such as judicial documents,the data quality can be divided into objective and subjective dimensions within the characteristics of the judicial field.Then the data augmentation method is designed based on the two dimensions.For the augmentation of objective quality dimension,take the parts of judicial documents that have structural features,combine the basic rules of text data augmentation and objective information theory,and augment judicial texts from the summarized data quality dimensions.The thesis proposes five augmentation methods in objective quality dimension totally,including delay augmentation,authenticity augmentation,integrity augmentation,consistency augmentation and readability augmentation.The five augmentation methods are related to five objective quality dimensions respectively.For the augmentation of subjective quality dimension,take the data quality dimensions that cannot be reflected in the structure of judicial documents.This thesis focuses on the semantic information contained in the texts,simulates human subjective judgments,combines machine learning technology,learns the features of judicial texts with word2 vec,and then implements judicial language features augmentation.Furthermore,the thesis implements emotional neutrality augmentation combined with sentiment analysis technology.The implementation of the technology in this thesis is based on Django,which is a Python web framework.The augmented judicial documents are output in the format of a text file.For file management,the HDFS file system is used to store data.The distributed storage service can effectively improve the scalability of file storage.Experiments have shown that the judicial text augmentation system designed in this thesis can augment high-quality judicial texts,which can then be used to improve the quality of smart judicial machine learning models. |