Font Size: a A A

Information Extraction Method For Unstructured Pathological Report Based On Pattern Matching

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2334330536452497Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology,the information construction of major hospitals in China has made great progress.At the same time,which has accumulated a wealth of unstructured clinical data for the hospitals.The pathological report is a very important kind of unstructured clinical documents,the content of which is the text format data recorded by the pathologist through natural language and includes patient information,visible specimen information,endoscope information and so on.The traditional way to deal with pathological report is main relying on the experience of the doctor to carry out pathological report artificial,the essence of which is to carry out the structural treatment for pathological report data through through manual intervention.However,under the background of big data,the manual processing method is not only time-consuming,but also difficult to guarantee the accuracy of the data.Therefore,by means of rule extraction,pattern matching,generalization and other technical means,this paper design and implementation a complete system to support the structure information extraction of pathological report data according to the structural characteristics of the pathological text and the writing specification of the pathology report.The specific content of this paper include:1)Firstly,this paper introduce the structural processing technology of the pathological report,which include Chinese word segmentation,information extraction methods,pattern matching algorithm and the reverse shortest edit distance generalization method.2)Then,this paper analyzes the structural characteristics of pathological data,establishes a pathological sample noun library,presents a extraction algorithm of the sample name based on rule,which screens the subject name through the position in the text of thesaurus,part of speech and word.3)Next this paper establishes initial library by using the method of artificial intervention to extract pathological sample information.On this basis,combined with the characteristics of the structure of the pathological report,this paper extracts the pathological sample model through the custom pattern matching algorithm.4)According to the generalization method of the shortest edit distance of the reverse,this paper presents the model generalization method which base on the positive shortest edit distance and get a general extraction model.5)Finally,on the new entry of pathological report data,we apply the existing model to extract information to achieve the purpose of real-time structure.In this paper,we use the real data to test and the results show that the system can ensure the correct rate of 88% based on ensuring the recall rate of 92% and can meet the expected requirements.Therefore,the implementation of the system not only can help doctors to improve the diagnostic efficiency,but also can provide data support for the pathological diagnosis of the disease in the future.
Keywords/Search Tags:pathological text, pattern matching, pattern extraction, structured transform
PDF Full Text Request
Related items