| In recent years,with the coming of the Big Data era and rapid development of artificial intelligence,machine learning study on implementing medical informationization and automation is very popular.Medical informationization and automation enable the effective development and efficient use of limited medical resources and promote the development of modern medicine.The realization of informatization and automation requires a large number of researchers to solve related problems from all aspects of the medical field.The dissertation focuses on two important topics in the medical informationization and automation,which are laboratory testing reduction for inpatients and automatic extraction of medical information from Wikipedia.Frequent laboratory testings of inpatients include a lot of low-value and unnecessary testings,which leads to a waste of medical resources and exposes patients to the risk of hospital-acquired anemia.Therefore,many studies focus on reducing unnecessary laboratory testings.Wikipedia contains rich biomedical information and is regarded as an important source of public health medical information.Extracting the medical text and knowledge from Wikipedia is beneficial for researchers to further research on the medical fields,such as constructing a medical knowledge graph or solving medical natural language processing tasks.This dissertation proposes new ideas and methods for solving the above two issues.For the first topic,two machine-learning-based predictive models,which are the Selffeeding model and the Corrupted-strategy model,are proposed,which mainly use the previous laboratory testing values to predict the properties and the necessity of checking each laboratory test at the current time.The key insight for laboratory testing reduction is that laboratory testings with predictable properties would be reduced first.To do this,novel model architectures and loss functions are devised to encourage the properties of the laboratory testings with the less predicted necessity of checking to be more precisely predicted.The models could flexibly provide laboratory testing reduction strategies with different reduction proportions based on the predicted necessity of checking laboratory testings.The experiments on the MIMIC III dataset show two models have a good predictive performance on the properties of the reduced laboratory testings when applying the laboratory testing reduction strategy,and the predictive performance will be better under the reduction strategy with the less reducing proportion.In particular,the Corruptedstrategy model greatly outperforms the Self-feeding model.For the second topic,a two-step classification workflow,which is based on the inner structure of Wikipedia,is proposed.First,a machine-learning-based crawling classifier is developed to identify important medical articles in Wikipedia and avoid visiting a large number of medically irrelevant articles to reduce the false discovery rate of the identification.Then,a deep-learning-based multi-label classifier is developed to further identify the articles of seven medical semantic groups among the identified articles from the crawling classifier.This proposed method has better precision and recall than the baseline models.Based on the identified results,the structured data of medical concepts containing the relational knowledge is extracted from Wikipedia and Wikidata,which can supplement the medical relationship of the existing knowledge base. |