Font Size: a A A

Extraction Of Tumor Information From Chinese Imaging Reports For TNM Staging

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2404330605956679Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
TNM clinical staging is a key step in cancer diagnosis and treatment.However,due to the fact that clinical staging relies on a large amount of information from different sources and doctors have limited time to make decisions,it is difficult to extract the relevant information of staging accurately.Therefore,there is a large deviation in clinical staging.It is of great significance to use computer technology to provide necessary information about cancer staging for doctors to make decisions on cancer staging,so as to improve the accuracy of cancer clinical staging.However,most of the information related to TNM staging exists in various image reports in the form of natural language,which cannot be directly utilized by computers.Therefore,it is essential to automatically extract the relevant information of TNM staging from image reports.In addition,structured information related to TNM staging can also be used for treatment plan recommendation,prognosis assessment and other aspects.Current research on tumor information extraction has the following deficiencies:1)it is not specialized for TNM staging,and the coverage of staging information is not comprehensive;2)the extracted results are still in the form of text,which cannot be directly used for staging decision making;3)limited by the extraction method used,it is impossible to obtain explicable evidence from the report text,which is not conducive to doctors' trust in the system.In this thesis,aiming at the above pain points and taking lung cancer as the research object,a Chinese image report tumor information extraction scheme for TNM staging was proposed,and the CT report was taken as an example to carry out the research.Literal expression related to the staging are extracted from Chinese imaging reports using information extraction technology which is composed of named entity recognition and relation extraction.Then the extracted information is inferred by rule-based method to get a Boolean or numeric value with literal text form the original report as evidence,which can directly serve the sequential decision support and other applications with interpretability.The main contents of this paper include:1)Based on the authoritative guide for TNM staging of lung cancer,this thesis disassembled the compound staging conditions expressed in words and combined with multiple conditions in the guide into a number of independent conditions with Boolean or numerical results,which are the staging information that can be directly used for decision support.2)Through the analysis of 50 actual CT reports,15 entity types and 4 relation types that can be used to infer the above staging conditions were designed,and 342 CT reports were annotated with 6152 entities and 4285 relations.3)Incorporating Glove word embedding,BiLSTM+CRF model and IDCNN+CRF model used for named entity recognition were constructed,and three different lengths of text segments were used to train two models,the results showed the semicolon as the smallest unit of split BiLSTM+CRF model performed better,exact match precision is 88.94%,recall is 90.75%,F1 is 89.83%,inexact match precision is 93.91%,recall is 94.97%,F1 is 94.41%.4)This thesis proposed an BiLSTM+Attention model with prior knowledge.The proposed model,ordinary BiLSTM+Attention model and multi-core CNN model were constructed and trained and tested on the 4,285 annotated relations.The results of five randomized controlled experiments showed that the method with prior knowledge could improve the model results,with precision of 96.73%,recall of 96.38%,and F1 of 96.53%,respectively increasing by 1.00%,0.55%,and 0.79%.5)This thesis proposed a rule-based method for staging condition inference,designed and implemented five core inference steps by applying the named entity recognition results and relation extraction results to all staging conditions that CT reports include.The precision 99.83%,recall 97.75%and F1 98.78%were obtained on the gold standard dataset of the first two steps.The three steps of named entity recognition,relation extraction and stage inference were concatenated to get an overall evaluation of the proposed scheme,and obtained the results of precision 98.33%,recall 96.20%and F1 97.26%,proving the effectiveness of the information extraction scheme.In this thesis,the Chinese image report tumor information extraction scheme not only achieved a high accuracy rate for the TNM clinical stage information extraction of lung cancer reported by CT,but also can be applied to the TNM stage information extraction of other image reports.
Keywords/Search Tags:Cancer Staging, Information Extraction, Named Entity Recognition, Relation Extraction
PDF Full Text Request
Related items