| Background:In order to ensure the safety of people’s medicines,the safety surveillance of drugs should be throughout the whole life cycle of drugs,in addition to the safety surveillance of drugs before the market,the safety surveillance of drugs after the market is also important.Post-marketing safety surveillance of drugs is mainly divided into two kinds of active surveillance and passive surveillance of adverse drug reactions.At present,passive surveillance studies are more widely carried out in the surveillance of adverse reactions.However,passive surveillance is based on data from spontaneous submission systems,which inevitably has problems such as omission,re-reporting,and inaccurate reported information,resulting in a low level of signal evidence obtained from passive surveillance.Active surveillance sets up each project as an independent project with a planned and strictly executed study,so its evidence level is high and applies to the confirmation of signals found by passive surveillance,but its conduct is limited by the large amount of human and material resources it requires.With the development of computer processing big data technology and the establishment of adverse drug reaction surveillance systems,it has become possible to carry out active surveillance based on real-world data.China built the Chinese Hospital Pharmacovigilance System(CHPS)and the national ADR sentinel surveillance alliance(CASSA)in 2016,which are important initiatives to carry out active surveillance of ADRs in China.The CHPS system converts the data from the sentinel hospitals into CHPS data through the built-in common data model,and then carries out active surveillance of ADRs.However,the current common data model of CHPS system is rather basic and can only integrate the data,but cannot provide standardized data for statistical analysis yet.At the same time,the CHPS data lacks targeted identification of adverse drug reaction outcomes,resulting in the lack of clear adverse reaction outcome variables in the studies conducted,and the identification of target adverse reactions can only rely on symptoms,laboratory tests and therapeutic drug formulation triggers that point to the target adverse reactions,failing to achieve the goal of accurate identification of adverse drug reactions.Objective:To address the problem of CHPS data export and standardization,this study explores a set of methods and processes to integrate heterogeneous data from multiple sources into standardized data that can be used for statistical analysis.To address the problem of lack of characteristic identification of adverse reactions in CHPS data,the method and feasibility of constructing an adverse reaction trigger identification based on machine learning algorithm are explored.Finally,based on the data of oncology patients in a sentinel hospital,we explore the adverse reactions of PD-1/PD-L1 inhibitors based on the above method,and provide a new idea for the active surveillance of adverse drug reactions in China.Method:1.To address the issue of the extraction-transformation-loading process of CHPS data into standardized data,this study draws on the process of integrating heterogeneous data from multiple sources into standardized data conducted by the Observational Health Data Sciences and Informational(OHDIS).This study draws on the process used by the Observational Health Data Sciences and Informational(OHDIS)to integrate heterogeneous data from multiple sources into standardized data.First,the CHPS data were scanned and the logical mapping of CHPS was constructed with the help of two small software programs,White Rabbit and Rabbit in a hat,developed by OHDIS,and the ETL of CHPS was constructed according to the study purpose.CHPS data is exported and converted into standardized data.2.To address the problem of lack of targeted identification of adverse drug reactions,in addition to constructing triggers for classical adverse reactions,this study attempts to construct triggers based on machine learning to identify adverse reactions and inspect whether they perform better compared with the predictive performance of classical triggers.The specific methods and processes for constructing machine learning triggers are as follows:first,a gold standard database of target adverse reactions is constructed based on the sampling validation method;subsequently,the entries of variables to be included in the target adverse reaction triggers are screened by combining literature review and expert consultation;finally,five models,namely Random Forest,Support Vector Machine,XGBoost,Light GBM,and Logistic Regression,were trained based on the gold standard database,and the model with the best predictive performance was used as the trigger for the target adverse reaction based on machine learning.The prediction performance of the triggers constructed based on machine learning was compared with that of the classical triggers to verify whether the prediction performance of the triggers constructed by machine learning algorithms had been improved.3.In the case study,this study first cleaned the FAERS(the United States Food and Drug Administration Adverse Event Reporting System database)database from the first quarter of 2015 to the first quarter of 2022 for PD-1/ PD-L1 inhibitors,and finally selected three potential adverse reactions of high concern to clinicians: thyroid dysfunction(IC/ROR=3.61/12.18),hyperthyroidism(IC/ROR=3.88/14.68),and myocarditis(IC/ROR=4.58/23.97)as the study’s the target adverse reactions for active surveillance.The gold standard database was then constructed using the sampling validation method,and the inclusion variables of the triggers constructed based on machine learning were screened through literature reading and two rounds of expert consultation.Various machine learning methods were trained based on the gold standard database,and the model with the best predictive efficacy was selected as the trigger for the target adverse reactions.In this study,data related to inpatient oncology patients from June 1,2018 to August 1,2022 were extracted from the CHPS system of a sentinel hospital in Chongqing,divided into PD-1/PDL1 inhibitor group and control group according to whether the patients had PD-1/PD-L1 inhibitors or not,and the baseline of the two groups were balanced by propensity score matching,and finally the prediction of target adverse reactions was applied model to predict the cases in both groups with and without the occurrence of target adverse reactions,to compare them,and to explore whether PD-1/PD-L1 inhibitors increase the risk of the three target adverse reactions.Results:1.In this study,data on inpatient oncology patients from June 1,2018 to August 1,2022 in sentinel hospitals,total 29,417 patients,were extracted based on the ETL derived from the pre-constructed CHPS data.The terms of the required data were transformed to standardized terms based on the OHDIS dictionary library to construct an active surveillance database for adverse drug reactions that can be statistically analysis.In this study,patients were divided into PD-1/PD-L1 inhibitor group and control group based on the presence or absence of PD-1/PD-L1 inhibitors,and the baseline of PD-1/PD-L1 inhibitor group and control group were balanced based on patients’ basic information,underlying disease,and the type of tumor they had by propensity score matching 1:4.After matching,PD-1/PD-L1 inhibitor group of 3496 patients and the control group of 12083 patients,the absolute value of SMD for each variable between the two groups was less than 0.1,and the difference between the two groups was not statistically significant.2.In this study,the gold standard database of target adverse reactions was constructed based on the sampling validation method,and after data verification to exclude 18 duplicate cases,a total of 482 patients were included in the gold standard database.In this study,the variables included in the machine learning trigger were screened by reading the literature and two rounds of expert consultation method.Subsequently,five prediction models,random forest,support vector machine,XGBoost,Light GBM,and logistic regression,were trained and learned based on the constructed gold standard database and the variables incorporated into the triggers,and finally the final machine learning-based constructed triggers for each target adverse reaction were selected based on predictive efficacy.The final machine learning models selected for the target adverse reactions were: Light GBM was selected for thyroid insufficiency(AUC/ACC/precision/recall/F1 score =0.944/0.885/0.872/0.850/0.861);random forest was selected for hyperthyroidism(AUC/ACC/precision/recall/F1 score = 0.988/ 0.979/0.950/0.950/0.950);random forest was selected for myocarditis(AUC/ACC/precision rate/recall rate/F1score=0.988/0.948/1.000/0.545/0.706).This study also examined the predictive efficacy of the classical trigger for target adverse reactions based on the gold standard database,and found that the predictive efficacy of the machine learning constructed trigger was better than the classical trigger.3.In this study,based on the constructed machine learning trigger to identify whether the observed data,after matching,occurred the target adverse reactions.It was finally found that 464 cases of thyroid insufficiency,142 cases of hyperthyroidism and 64 cases of myocarditis occurred in the PD-1/PD-L1 inhibitor group,and the differences in the incidence of the three target adverse reactions were statistically significant compared with the control group,and PD-1/PD-L1 inhibitors could increase the risk of all three adverse reactions.Conclusion:This study successfully extracted and transformed CHPS data into standardized data for statistical analysis with the help of the OHDIS ETL software,the common data model,and the dictionary library to build the CHPS data export and standardization method and process,which is feasible.Compared with the classical trigger for the identification of adverse drug reactions,the machine learning trigger constructed in this study has a higher predictive performance and more accurate identification of adverse drug reactions.This study provides a preliminary exploration of active drug surveillance based on CHPS data in China,mainly on the integration of heterogeneous data from multiple sources with standardized methods and processes,and on the construction of triggers based on machine learning,to provide a new idea for active surveillance of adverse drug reactions in China and to better ensure the safety of people’s drug use. |