| Similar case matching refers to finding similar cases with similar criminal circumstances with the specific case in the case base,so as to realize intelligent judgment,law recommendation and other auxiliary trial work.To determine whether the two cases belong to "similar cases",there are mainly problems such as too long information in the case file,polysemy,too much redundant information,and non-standard fact determination by the court in natural language description.This is a great challenge to complete the similar case matching task.In this context,this thesis proposes a multi-task fusion case matching model to solve the problem of polysemy in long text to a certain extent and improve the accuracy of long text matching.This thesis focuses on three parts:First,aiming at the polysemy problem of long text in different contexts,a class case matching model based on single semantics is proposed.BERT model and BILSTM model are combined to extract features of documents,and an efficient mapping relationship between text word embedding and vector representation is established.Since the statements based on the facts of the case in the text are generally between 500-800 words,and the input of the BERT model is not more than 512 words,the model adopts the backward truncation method,and uses the BERT model and the BILSTM model to extract and fuse the global and local features of the case information.To a certain extent,the influence of the text context on semantics is eliminated.The semantic information of documents was extracted to further improve the performance of the case matching model.Second,due to the limitations of the BERT model for the length of the input data.The description of the facts of the case in most legal documents far exceeds 512 words.This leads to a single-language matching model,and the information features of some case papers are lost,and the element extraction is not complete.In order to solve the problem of incomplete facts of the factors of the case,this article then proposes a model matching model based on multi-tasking fusion.Choose to use the Roberta model to layer the referee document data that needs to be entered.At the same time Predict the angle of similarity.Combined with the Cross Entropy Loss function and the Cosine Embedding Loss function,we optimize the similarities between the sentence pair and the sentence between the sentence pair and the similarities between sentences from the perspective of multi-tasking learning.The experimental results show that the model has achieved a good accuracy rate in similar case matching.Compared with the matching model based on a single-language case,the accuracy rate has increased by 2.78%,providing a technical basis for further improving the mating system matching system.Third,based on the research of similar case matching algorithm,a similar case matching system is designed and implemented.The system includes case retrieval and case matching functions.The system has initially implemented the application of case matching and provided strong support for the application of intelligent auxiliary case-based case-based matching cases. |