Font Size: a A A

Precision Medicine Corpus Annotation In Liver Neoplasms

Posted on:2019-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2394330542497333Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
With the advent of precision medicine,how to quickly and accurately extract valuable and usable information from massive data has become a difficult problem for researchers,as well as an important way to improve the efficiency of biomedical research and to seek reliable evidence for clinical diagnosis.In order to solve this dilemma,the technology of biomedical named entity recognition and semantic relation extraction has been developed rapidly.As a foundation and key link of text mining technology,corpus construction has become increasingly prominent.Studies have shown that corpus is very important to improve the accuracy of the recognition of text mining technology,and the shortage of corpus is one of the bottlenecks restricting its rapid development.Aiming at the problem that the current biomedical corpus has few entity types and simple entity relationships,and could not meet the needs of the development of precise medical knowledge base,this research takes the literature of liver neoplasms in CTD database as the annotated text,to construct precise medical corpus in liver neoplasms,and to develop corpus annotation guidelines.With a view to providing reliable data support for the study of precision medicine named entity recognition and semantic association extraction,and to solve the problem of shortage of available annotated corpora.The content of this article will focus on the following sections:The first part introduces the background and the status of technological development of this research topic,points out the theoretical and practical significance of this study,and proposes the research objectives,content,methods and technical routes.The second part systematically analyzes and describes the current research status of corpus,ontology,etc.,and selects tools and methods for research.The third part elaborates the overall process of Corpus annotation,including the selection of text to be marked,the preparation of annotation tools,the formulation and optimization of annotation specifications,the process of naming entities and entity relationship annotation,and the summary of pre-annotation problems.In the fourth part,the results of the annotation are shown in detail,including the interpretation of the final output results of the Brat manual annotation tool,the statistical situation of the annotation results,and the comparison and analysis of the results of manual annotation and PubBator automatic annotation.The fifth part summarizes the most important annotation guidelines in this study,including the explicit annotation of objects,compound nouns,proteins and genes,and full names and abbreviations of nouns.The fifth part summarizes the most important labeling norms in this study,including the explicit labeling of objects,the labeling of compound nouns,the labeling of proteins and genes,and the labeling of full names and abbreviations of nouns.The sixth part summarizes the work done by this research,and looks forward to the possible application of this research annotation guidelines and the development of the research on corpus annotation based on domain ontology.This study refers to the construction process of existing corpora and makes the initial corpus annotation guidelines,the guidelines are gradually revised through pre-annotation.In the process of corpus construction,the precision medicine ontology containing 6 dimensions of human phenotypes,diseases,chemicals and drugs,cell mechanisms,molecular mechanisms and genetic mechanisms was used for the first time,which involved more extensive entity type,richer entity relationship and more abundant in the definition,and guarantees the quality of this corpus.This study finally completed the annotation of 10045 naming entities and 2489 semantic relationships.At the same time,the paper puts forward the selection process of annotated text and the formulation of corpus annotation guidelines,and summarizes a large number of typical examples of annotation of named entity and entity relationships,which can provide valuable reference for other biomedical corpus construction tasks in the future.This study mainly relies on manual annotation,which is costly and is not suitable for building large-scale corpus.However,manual annotation corpus are considered as the gold standard corpus,and the quality of the annotation is much higher than that of the auto-annotated corpus.The corpus constructed in this study can be used as a “seed”,and based on this,one or more supervised classifiers can be used to iteratively expand the scale of the corpus.
Keywords/Search Tags:liver neoplasms, precision medicine, corpus annotation, ontology
PDF Full Text Request
Related items