Font Size: a A A

Design And Implementation Of The Structured System For Pathological Microscopy Text

Posted on:2017-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q LiuFull Text:PDF
GTID:2308330503953786Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The pathology report as one of the unstructured data in medical documentation, is an important document in diagnostic pathology. It’s a kind of clinical data in text format, which is based on biopsy results and recorded by pathologists using the standardized form of natural language. And the unstructured data is not only the key to make the pathological diagnosis by pathologist but also an important tool in disease diagnosis for clinicians.Currently, pathologic diagnosis is the empirical results made by doctors with subjective judgments based on the the main ideas of the text data. The substance of reading data is the progress of extracting information and structuring text data manually, it’s inefficient and the accuracy can not be guaranteed which leads to misdiagnosis. Aiming at this problem, the paper designed and implemented the system of structuring data for pathological microscopy text based on its characteristics by statistical analysis, text clustering and chinese word segmentation etc. so as to extract structured data automatically.First, the paper summarized the characteristics of pathological microscopy text as the basis of the solution and accomplished the text preprocessing including splitting clause, marking feature words depending on its characteristics. Then it gave the the processing flow of structuring text data. Based on these operations, it designed the architecture of the system and introduced the main functions and processing flow of the three main modules: text preprocessing module, constructing the pathology dictionary module and structuring text module.Then, for constructing the pathology dictionary, the paper proposed the key words extracting algorithm based on the characteristics of pathological microscopy text with the similar short text as its input by text clustering, realizing the function of extracting the key words from the similar short sentences. And it finally got the dictionary composed of attribute words and described words by extending words depending on the writing mode and specifications of the text data.The last, it realized the function of extracting terms of key-value pairs from text data by the algorithm of structuring pathological microscopy text based on the dictionary and got the structured data with full semantic by adding the negative words from negation detection. Meanwhile, it implemented the web system of extracting structured data for users and for higher accuracy, it updated the dictionary by accepting the user’s feedback.To validate the validity of the algorithm, the paper experimented on real-world data sets, the pathology dictionary and the structured data achieved desired objectives by checking manually. It not only overcame the non-applicability in various fields of the general word segmentation tools and realized the function of extracting structured data automatically but also could provide strong data support for the analysis of diseases in the future.
Keywords/Search Tags:Text Structure, Text Clustering, Chinese Word Segmentation, Pathological Microscopy Text, Keyword Extraction
PDF Full Text Request
Related items