Font Size: a A A

Study On The Similarity Of Legal Texts

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2428330596960908Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the supreme Judicial Court of China has published various judgement documents.For most researchers,judgement documents are precious materials for legal analysis.Considering that the judgement documents are unstructured,it is a valuable and urgent problem to extract important information in judgement documents and use them for similar case recommendation.In view of technique aspect,the problem belongs to text similarity computation in legal domain.In general,judgement documents differ widely,due to the different domain knowledge they contain.Therefore,it is unpractical to establish a domain independent similarity model for all kinds of judgement documents.Considering the actual demand and the application condition,this paper focuses on similarity research of medical dispute judgement documents.The paper builds a similarity model for medical dispute documents based on knowledge of medical field.Additionally,a recommendation system for finding similar medical dispute documents is designed according to the model,to achieve the goal of decision supporting.There are various problems in similarity research of medical dispute documents.Firstly,domain knowledge is crucial for domain specific documents similarity computation.It is worth considering how to integrate domain knowledge into similarity model.Secondly,in the field of document similarity computation,it is a general method to filter some documents according to categories before detailed similarity computation,and supervised learning is effective in document categorization.However,in the domain of medical dispute,no public labeled data sets are accessible for similarity research.Furthermore,it is expensive to label a data set manually,thus an efficient category strategy is needed.Finally,massive redundancy in medical dispute documents result in a relatively large error in normal similarity models.To cope with above problems,research of this paper focuses on the following aspects.Firstly,we read and analyzed a large number of medical dispute documents,and established a multi-level medical dispute category system,based on the suggestions of medical experts.The category system provided with us a strong research foundation.After that,the general method of filtering some documents according to categories before detailed similarity computation is used in the paper.In order to solve the problem of labeled set lacking,this paper introduced an active learning based classification method,which can acquire high accuracy rate with a small labeled data set.When computing similarity in the same category,this paper proposed an event based document representation method.The main idea is to extract dispute elements from the document,and then a dispute event is presented and document similarity is computed based on the similarity of events.Finally,a series of comparative experiments are taken among event based similarity model,vector space model and topic model,to prove the efficiency of the proposed similarity method.Experiments show that the event based similarity model has the best result compared with traditional vector space model and topic model.
Keywords/Search Tags:medical dispute, label matching, event extraction, similarity computation, recommendation system
PDF Full Text Request
Related items