Computing Document Similarity For The Legal Case Retrieval

Posted on:2019-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:L J Li

Full Text:PDF

GTID:2428330548993825

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Document similarity calculation is a basic work in the retrieval of legal cases.However,the current legal case retrieval technology is not mature.This thesis analyzes and compares existing document similarity calculation methods from the perspective of traditional methods and deep learning methods,and then designs and implements more effective case text similarity calculation models and algorithms for the deficiencies of existing methods.The main work of this thesis is as following:(1)A legal case text similarity annotated data set is developed in this thesis,which has 1225 different document pairs.At present,there is no public Chinese legal case data set or Chinese document similarity data set of other tasks,the experimental data set is the basis of our experimental demonstration.(2)The document similarity calculation method based on bipartite graph and syntactic information is proposed and designed.This thesis designs and implements the traditional baseline system based on lexical semantic information and TF-IDF.For the baseline system without considering the complete information keyword vector and lack of syntactic information,this thesis uses bipartite graph to improve the calculation method of keyword vectors,and integrates syntactic information when calculating the similarity of documents.(3)The document similarity calculation method based on attention mechanism and document content compression is proposed and designed.First of all,this thesis designs and implements the baseline systems for deep learning method of the Siamese network model based on long term short memory.For the baseline system without considering the importance of different word items in the document,this thesis proposes and designs the Siamese network model combined the attention mechanism.Further to the problem that the whole text is regarded as the input of the model,which leads to sparse data,it is proposed to use the hierarchical attention mechanism to improve the representation of documents in the Siamese network.At last,because the Siamese network model of the hierarchical attention mechanism may ignore the problem of important sentences in the document,this thesis proposes a two-step document similarity calculation method combined case content compression.A series of experiments are conducted for the proposed document similarity algorithms based on the traditional method and the deep learning method.The experimental results show that our methods achieve higher performance than the baseline systems.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Digital Watermark Of Document Image Removal Technology And Similarity Test
2	Similar Text Discrimination Based On Siamese Network
3	Research On Siamese Trackers Based On Attention Mechanism
4	Siamese Network Tracking Algorithm Based On Dual Attention Similarity And Depth Map
5	Research On Object Tracking Algorithm Based On Deep Siamese Network
6	Research On Visual Tracking With Siamese
7	Research On Multi-document Summarization Models With Graph Structured Semantics Representation And Redundancy Control Mechanism
8	Research On Object Tracking Algorithm Of Siamese Network Fuse With Attention Mechanism
9	Video Object Tracking Based On Siamese Network And Attention Mechanism
10	Research On Siamese Network Object Tracking Algorithm Based On Attention Mechanism