Font Size: a A A

An Automatic Approach Towards Constructing Chinese Medical Terminology Resource

Posted on:2017-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:M Z JuFull Text:PDF
GTID:2308330485957113Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Medical terminology resource is crucial in medical language processing. Wide-coverage and well-defined medical terminology concepts and their relationships are fundamental in automatically processing medical narrative texts to realize artificial intelligence applications such as information extraction, texts understanding and knowledge discovery. Clinical free texts which act as a more efficient form of recording clinical data contain large amount of clinical information unavailable from other data source. As the medical informatics develops, a mountain of clinical free texts accumulate rapidly, which affords the great opportunity to utilize them to aid higher-level clinical research unprecedentedly. Unfortunately, lack of Chinese medical terminology resource disables the advancement of related technique and research. The construction of large-scale medical terminology is time-consuming and requires continuous maintenance, thus becoming the one of the three acknowledged challenges in natural language processing. In response to these problems, we proposed a NLP-based approach to automatically build Chinese medical terminology. Using this method, Chinese medical terminology resource was automatically constructed. In addition, comprehensive evaluation was conducted aiming to demonstrate the quality and coverage of the medical lexicon and relative applications were carried out with the purpose of validating the practicability. Details were presented in the below:Firstly, based on the conditional random field algorithm, an iterative and automatic method was proposed to discover the new medical terms from clinical corpus under the condition of the lightweight specific-semantics dictionary. In terms of function, the method enables the discovery of new terms with the given corpus and some semantics dictionary. This approach will greatly improve the efficiency of constructing terminology and save much time and human efforts.Secondly, we achieved the translation of three common semantic terms in UMLS utilizing machine learning translation technique and integrate these into the large-scale terminology constructed by the CRF-based approach. In order to demonstrate the quality of the lexicon, precision evaluation was made through random sampling and coverage evaluation was conducted in a small-scale corpus. High precision and coverage reveal that the constructed medical terminology will lay good foundation in further research in our research group.Lastly, three applications were conducted in clinical corpus on the basis of the constructed lexicon. Firstly, a comprehensive analysis was made on different departments in terms TF-IDF of the symptoms. Secondly, a knowledge base which contains 10292 symptom-body part relation pairs was built. Lastly, a comprehensive analysis was carried out in part-of-speech, semantics and sublanguage pattern aspects. These applications offer rich quantitative indicators for further understanding the grammar, semantics and pragmatics.We proposed an efficient method of building large-scale medical terminology resource which provides a powerful computational tool for the construction of medical term knowledge base and facilitate the medical language processes in order to achieve efficient use of clinical information.
Keywords/Search Tags:Medical Language Processing, Medical Terminology, Conditional Random Field, Clinical Corpus
PDF Full Text Request
Related items