| Background: Disproportionality analysis including proportional reporting ratio(PRR),reporting odds ratio(ROR)and information component(IC)are used in adverse drug event detection,while only drug name and adverse drug events are applied in these methods.Some significant variables like age,gender,race and weight and time tendency are not considered in disproportionality analysis.Disproportionality analysis reports some common problems like false positive rate,and some measures including reducing duplicate records and shrinkage transformation are conducted,which gained insignificant efficiency.Adequate consideration of accessible data,adjusting related background factors and properly detecting adverse drug events have become profound and difficult problems in pharmacoepidemiology.Change-point analysis(CPA)model is a time change point mining method based on maximum likelihood estimation,which can indirectly reflect the time trend of data changes.The existing CPA model mainly includes Binary Segmentation algorithm(Bin Seg),Segment Neighbourhoods algorithm(Seg Neigh)and Proposed exact linear time algorithm(PELT),which rely on indicators like Mean,Var and Mean Var to detect data changes.No one has applied CPA model to adjust time background noises in signal mining of adverse drug events.Notably,due to the data sources complexity and of adverse drug events,it is difficult to find data distribution regularity.At the same time,it is hard to apply some complex clustering methods in the data of tens of millions of adverse events.As a classic fast clustering method,K-Means clustering can improve computational efficiency with the guarantee of results accuracy,which is suitable for adjusting non-chronological background noises such as gender,race,and age in large data.Objective: Combining existing knowledge and choosing appropriate indicators,segmentation methods and penalties to build different CPA models.The CPA models optimized by KM can be used to adjust the background noises including the chronological trend in adverse drug event signal detection,so as to realize the preprocessing of the irregular data.In each cluster of data,disproportionality analysis based on shrinkage transformation model is used for signal mining,and the mined signal is weighted and fitted to obtain the final signal.According to the golden standard established by FDA drug inserts,clinical trials and case series research data referring to Med DRA(Medical Dictionary for Regulatory Activities)codes,sensitivity,specificity and Yorden index are applied as the evaluation criteria to compare the characteristics of established algorithms and disproportionate analysis in adverse drug event detection.This study aims to explore the characteristics and values of KM algorithm optimized CPA models and provide methodological references for correcting conventional and chronological background noises in future adverse drug event detection.Methods:1.CPA models optimized by KM algorithmsBased on the irregularity of drug adverse event data,this study considers a variety of penalties like SIC,HQ and Asymptotic,and Manual penalty with the consideration of indicators like Mean,Var and Mean Var and CPA point segmentation methods such as Bin Seg,Seg Neigh and PELT algorithms to explore fitted CPA models and adjust conventional and chronological background noises in adverse drug event detection.The KM algorithm is suitable for tens of millions of data due to its’ high efficiency,accuracy and low computational burden,which can be explored for conventional background noises in adverse drug event data.CPA models optimized by KM algorithm aim to simultaneously correct the non-chronological and chronological confounding in the increasing adverse event data,and combine it with disproportionality analysis based on shrinkage transformation model in drug adverse event mining.2.Data simulation and verification of different CPA modelsMonte Carlo simulation is adopted in the verification of different CPA models considering the length of the time series,the frequency distribution of drug adverse events at each time point,the indicator type and numbers of change points and other assumptions to simulate time series sequence.Established CPA models are applied in the time series sequence simulated under different assumptions to explore the characteristics of change point mining in different situations,and each time series sequence is simulated 1000 times in each case.Accuracy rate is defined as the evaluation index of each model.3.Build golden standard for signal verificationGolden standard is established according to the latest drug instructions issued by the FDA,combined with published clinical trials,case series studies,and single-case report literatures,with the reference of the PT level from Med DRA code related to FARES(US Food and Drug Administration Adverse Event Reporting System)database managed by FDA.4.Comparison of cardiovascular toxicities in immune checkpoint inhibitors based on FARES databaseAccording to the selected confounding variables,signal detection is conducted after adjusting the non-chronological and chronological confounding noises.This study will conduct the following case studies based on FARES database to detect cardiovascular toxicities in immune checkpoint inhibitors including PD-1(Pembrolizumab,Nivolumab and Cemiplimab),PD-L1(Durvalumab,Avelumab and Atezolizumab)and CLAT-4(Tremelimumab and Ipilimumab)compare to the golden standard library and explored the characteristics of signal detection results from disproportionality analysis based on shrinkage transformation model,single KM algorithm,KM algorithm optimized CPA model.Results:1.Simulation Research(1)The correctness rates of different CPA model in simulated time series sequence with different length and distribution simulation data are relatively similar,and the correctness rate of BETA distribution fluctuates greatly.(2)When the time series length,data distribution type,change point index type and other factors are fixed,the accuracy of the CPA models based on Mean and Mean Var indicators is higher than that of the Var indicator series models,while the accuracy of the Var indicator series model fluctuates slightly.Similarly,the PELT segmentation algorithm has higher accuracy and volatility compared to the other two algorithms.(3)The accuracy rate of most CPA models in each simulation sequence based on different number of change points increases gradually with the boosting number of change points.(4)The correct rate of each CPA model based on different penalties is similar when the indicator and segmentation algorithm are determined,which is also less affected by various simulation factors.2.Case Study(1)Disproportionality analysis based on shrinkage transformation modelThe results of disproportionality analysis based on shrinkage transformation model show that there are more cardiovascular event reports were detected in Nivolumab monotherapy and polytherapy.Medical staff should be vigilant in Nivolumab-related adverse events such as heart failure,myocarditis,hemoptysis,and ascites,which are highly reported and have higher death proportion rate.Notably,non-golden standard events show a higher deaths proportion rate than the clear golden standard events.(2)Tendency analysis results of disproportionality analysis based on shrinkage transformation model,the single KM algorithm,the single CPA model and the KM algorithm optimized CPA modelsThere is no significant difference between the results of different penalties corresponding to the same segmentation algorithm and indicator,and the effect of the penalty methods is less than that of segmentation algorithms and indicators.The single CPA algorithm and KM algorithm optimized CPA model have slightly lower IC025 values compared to results of disproportionality analysis based on shrinkage transformation model,which corresponds to the subsequent sensitivity and specificity results.The single CPA algorithm and KM algorithm optimized CPA models based on Mean and Mean Var indicators have poor sensitivity and better specificity.However,KM algorithm optimized CPA models based on Var indicator share higher specificity and lower sensitivity.(3)Signal detection results of disproportionality analysis based on shrinkage transformation model,the single KM algorithm,the single CPA model and the KM algorithm optimized CPA modelsAmong the single CPA algorithm and the KM algorithm optimized CPA models,results of algorithms based on Var indicator is generally higher than those based on Mean Var and Mean indicators.In addition to the specificity,Var_Bin Seg algorithm with a single adjustment of chronological confounding noises has advantages in all other indexs,and Var_PELT and Var_Seg Neigh algorithms with chronological noise adjustment have similar trends.Further explore results show that appropriate number of change points is highly important to noise adjustment.In all indexs except specificity,the single CPA model is generally better than the KM algorithm optimized CPA models,and the single KM algorithm is generally inferior.Notably,the specificity of a single KM algorithm and KM algorithm optimized CPA models is generally higher than that of a single CPA model.The best performance of specificity is observed in KM optimized CPA models based on Meanvar_Bin Seg and Meanvar_PELT algorithms.In addition to the Mean indicator,Bin Seg segmentation algorithm is relatively better than the PELT and Seg Neigh segmentation algorithms based on another two indicators.Based the Mean indicator,the Seg Neigh segmentation algorithm is relatively better than or similar to the other two segmentation algorithms.The sensitivity of each algorithm is relatively stable with the increase of the drugadverse event frequency corresponding to a specific strategy,while other indexs fluctuate greatly.Generally,the specificity decreases faster with the increase of drug-adverse event frequency.Considering the relatively stable sensitivity,agreement rate and the Youden index are more obviously affected by the change trend of the specificity.Conclusion:(1)In the simulated data,the timing sequence length and distribution have little effect on the accuracy rate of each CPA model,thus the results of this study can be extrapolated to shorter or longer timing sequences.The accuracy rate of some data with special distribution fluctuate greatly,which means the results from actual data are not completely similar to those obtained from simulation data.(2)When the influencing factors are fixed,the accuracy rate of the CPA model based on Mean and Mean Var indicators is relatively higher than that of the CPA model based on Var indicators,while the fluctuation of the results in CPA model based on Mean and Mean Var indicators is also much higher.Actual data is often based on perplex indicators and change quickly,thus the more robust Var indicator series CPA model can performs better in subsequent case study.Similarly,the PELT segmentation algorithm,which has a relatively high accuracy rate and significant fluctuations,is significantly inferior to the Bin Seg and Seg Neigh segmentation algorithms in the case study.(3)In case study,Nivolumab-related cardiovascular adverse events are more frequently reported.Some significant important adverse events deserve the vigilance of clinical medical staff,and timely updating of drug adverse event information can reduce it’s negative impact.(4)Similar to the findings in simulation study,the penalty methods play a little role in the case study.For the same drug-related adverse events,algorithms based on the Mean and Mean Var indicator are slightly conservative in some index like signal sensitivity compared to disproportionality analysis based on shrinkage transformation model,while algorithms based on Var indicator is relatively sensitive.In the single CPA model and the KM algorithm optimized CPA models,algorithms based on the Var indicator is generally better than the Mean Var and Mean indicator-related algorithms in all indexes except the specificity.Combined with the findings from simulation analysis,data in the case study is very complicated.(5)In addition to the specificity,results of Var index algorithm for single adjustment of chronological noise is mostly better than disproportionality analysis based on shrinkage transformation model and the single KM algorithm,suggesting that adjusting chronological noise is very significant for improving the overall sensitivity and detection efficiency.Similarly,correcting conventional confounding can improve the specificity of the algorithm.(6)Most algorithms have little difference in sensitivity between highly reported and rare adverse events,while higher specificity can be observed in highly reported adverse events.It is suggested that most algorithms can better exclude the negative signals in the rare or new adverse events other than highly reported adverse events.Similar trends can be observed in agreement rate and Youden index.Clarifying the characteristics of time series sequence is beneficial to select the corresponding CPA model.Complicated distribution and numerous changes can be observed in actual situations,thus result robust CPA algorithms will perform better in actual operations.In the actual case study,various algorithms have their own advantages and disadvantages.Single CPA model has higher sensitivity,while single KM algorithm and KM algorithm optimized CPA models are observed with higher specificity.The results of some algorithms are greatly affected by the frequency of adverse drug events,thus it is very highly significant to combine multiple algorithms in adverse event signal detection. |