Research And Implementation Of Structured Processing Of Medical Text Data Based On Spark Platform

Posted on:2018-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:X H Zhang

Full Text:PDF

GTID:2348330536452499

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Traditional methods of processing medical text are artificial processing based on the doctors' experience in clinical treatment.However,these methods are not only time-consuming,but also can't meet the expected requirements on the accuracy of structured processing.In the era of big data,the growing medical data has brought new challenges for the medical industry: a large number of medical texts are produced when hospitals provide diagnosis and treatments for patients.Among them,the vast majority of medical texts are the semi-structured or unstructured.By transforming the semi-structured or unstructured medical text data into the structured data which can be recognized and understood by computer,we can achieve new breakthroughs on scientific research,clinical diagnosis and treatment,data sharing,etc.The definition of structured processing of medical text is transforming semi-structured or unstructured medical text into structured text.At present,structured processing of medical text mainly divides into two categories: former structured processing and later structured processing.The former processes data through the specified system.The later mainly processes data by utilizing the technology of natural language.The aim of structured processing of medical text is automatically extracting the index name and its corresponding parameter value.For this purpose,this thesis concludes the structure and language feature of medical text.On this basis,a method of structured processing of medical text is put forward.This method mainly has three parts: text preprocessing,new words discovery and information extraction.The text preprocessing mainly performs cleaning,integration,transforming,and specification on text data to make the data consistent and provides accurate data for the later operation.New words discovery finds medical terms in the medical text based on the word embedding.Word2 vec,Google open source word embedding tool is used to train the medical text and transform a word into the n dimensional vector space.The new words can be found and added into the user defined lexicon according to the internal grade between words,the information entropy and word frequency.Information extraction is mainly responsible for designing information extraction rules to extract key information.According to the key words found by the new words discovery,the corresponding key information can be extracted.In the end,structured processing of medical text is finished by organizing them into structured data.This thesis deploys the three parts above in Spark platform and uses distributed computing to complete the structured processing of medical text.In order to verify the feasibility of the proposed method,we randomly select a part of data from the text as sample,which is structured by means of artificial extraction processing.And then comparing the standard results with the results of the method about structured processing of medical text put by this thesis to prove that this method can achieve expected effects.

Keywords/Search Tags:

Medical Text Structure, Chinese Word Segmentation, Word Embedding, Information Entropy, Information Extraction

PDF Full Text Request

Related items

1	Research On Chinese Word Segmentation Integrating Pinyin And Tone Information
2	Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval
3	Research On Chinese Word Segmentation Method Based On Word Embedding
4	The Design And Implementation Of Text Topic Key Word Processing System Based Chinese Word Segmentation
5	Study On The System Of Chinese Automatic Word Segmentation Based On Text Information Of BBS
6	Research Of Chinese Text Categorization Algorithms Based On Information Entropy
7	Research And Application Of Internet Chinese Text Classification
8	The Research Of Chinese Word Segmentation Disambiguation Based On Word Environment Information
9	Research On Cross-domain Chinese Word Segmentation Method Based On New Word Discovery
10	Research On Large-Scale Chinese People Information Extraction Based On Web