Font Size: a A A

Research On Automatic Chinese Judgment Documents Summarization Based On Deep Learning

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:C CaoFull Text:PDF
GTID:2556306725483804Subject:Engineering
Abstract/Summary:PDF Full Text Request
The Internet is increasingly penetrating into every aspect of people’s lives,recording and storing a large amount of data that is constantly being produced in every field.In Legal field,the China Judgments Online set up by the Supreme People’s Court of the People’s Republic of China publishes judgment documents of binding judgments from the people’s courts at all levels.In recent years,with the improvement of the country’s legal system and the enhancement of citizens’ awareness of safeguarding their rights,the number of various kinds of lawsuits has been increasing constantly,and the number of new judgment documents published online has also been growing rapidly.The number of words in a judgment document is usually about 2000.For relevant practitioners or ordinary citizens who have access needs,it is undoubtedly time-consuming and energy consuming to read it completely by manual means.In order to reduce the cost of reading judgment documents and improve the work efficiency of relevant practitioners or ordinary reviewers,it has become an urgent requirement to apply the summarization technology to judgment documents.Deep learning network is a very popular method of summarization at present.In recent years,many works have applied deep learning network to the summarization task of various text types,including news,and achieved good training results.However,it is not satisfactory to apply these summarization models directly to the summarization task of Chinese judgment documents.This is firstly because judgment documents usually contain thousands of words which is far beyond the processing capability of most models.Models based on attention mechanism can capture the context information over a long distance,but because the complexity of the attention calculation is proportional to the quadratic length of the sequence,it becomes very expensive to process long text.Besides,due to the limitation of vocabulary size,many words related to the names and places in the judgment documents cannot be properly mapped,resulting in <UNK> tags in the input sequence,and the summaries generated by those kinds of models cannot contain these important words.To solve the above problems,this thesis proposes a new Chinese judgment documents summarization model Judgformer Ptr.Judgeformer Ptr adopts multi-task mode.Firstly,a key sentence extraction model is used to read the sentences in the input sequence and complete the key sentences extraction.Then,the key sentences encoder takes the important sentences as input,and completes the encoding task by multi-layer self-attention computation.In the decoding stage,Judgeformer Ptr introduces PointerGenerator Network to automatically switch between word generation mode and word copy mode in every time step.When the generated summaries need to contain words such as name and place name,it can break through the restriction of vocabulary and copy them directly from the source text.In order to help the Pointer-Generator Network pay more attention to the important words in source text,we constrain the copy range of Pointer-Generator Network by a bottom-up attention step.this thesis compares the performance of Judgformer Ptr and the baseline method on the open data set of summarization task in the Chinese AI and Law Challenge,and verified the effectiveness of each module of Judgformer Ptr through variant experiments.We also observe its sensitivity to the number of training samples by training it on small data set.Experiments show that the proposed model can generate Chinese judgment summaries automatically with BLEU,Rouge-1,Rouge-2 and Rouge-L of the output results reaching 0.550,0.601,0.351 and 0.526,respectively.Finally,this thesis designs and develops a Web prototype system for the online generation and display of Chinese judgment document summaries.
Keywords/Search Tags:Text summarization, Judgment document, Neural network, Attention mechanism
PDF Full Text Request
Related items