Font Size: a A A

Computing Document Similarity For The Legal Case Retrieval

Posted on:2019-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:2428330548993825Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Document similarity calculation is a basic work in the retrieval of legal cases.However,the current legal case retrieval technology is not mature.This thesis analyzes and compares existing document similarity calculation methods from the perspective of traditional methods and deep learning methods,and then designs and implements more effective case text similarity calculation models and algorithms for the deficiencies of existing methods.The main work of this thesis is as following:(1)A legal case text similarity annotated data set is developed in this thesis,which has 1225 different document pairs.At present,there is no public Chinese legal case data set or Chinese document similarity data set of other tasks,the experimental data set is the basis of our experimental demonstration.(2)The document similarity calculation method based on bipartite graph and syntactic information is proposed and designed.This thesis designs and implements the traditional baseline system based on lexical semantic information and TF-IDF.For the baseline system without considering the complete information keyword vector and lack of syntactic information,this thesis uses bipartite graph to improve the calculation method of keyword vectors,and integrates syntactic information when calculating the similarity of documents.(3)The document similarity calculation method based on attention mechanism and document content compression is proposed and designed.First of all,this thesis designs and implements the baseline systems for deep learning method of the Siamese network model based on long term short memory.For the baseline system without considering the importance of different word items in the document,this thesis proposes and designs the Siamese network model combined the attention mechanism.Further to the problem that the whole text is regarded as the input of the model,which leads to sparse data,it is proposed to use the hierarchical attention mechanism to improve the representation of documents in the Siamese network.At last,because the Siamese network model of the hierarchical attention mechanism may ignore the problem of important sentences in the document,this thesis proposes a two-step document similarity calculation method combined case content compression.A series of experiments are conducted for the proposed document similarity algorithms based on the traditional method and the deep learning method.The experimental results show that our methods achieve higher performance than the baseline systems.
Keywords/Search Tags:document similarity, bipartite graph, Siamese network, attention mechanism
PDF Full Text Request
Related items