| Background and ObjectiveDrug safety is an issue of wide-spread concern. Many serious adverse events hadcaused great and irreparable loss to a lot of victims, also led to great concern of the society.The importance of Adeverse Drug Reaction (ADR) surveillance is self-evident. The ADRspontaneous reporting system (SRS) was introduced to China at2002, which providesplanty of valuable datasets for post-marketing surveillance. Individual reports of suspectedADRs are collected and analyzed. The number of spontaneous reports grows quickly, andthe effective ways of analyzing these datasets become focal and difficult issues.Evaluation of the effects of ADR influencing factors is helpful for identifying thefeatures and mechanisms, high risk population of ADRs, and potential route ofadministration causing these ADRs. Logistic regression is currently used for influencingfactor analysis in SRS and some other ADR datasets. It is an effective tool for both riskfactors identification and confounders control. While, it’s application in SRS databases arerestricted for the following reseasons. Firstly, Logistic regression needs sufficient samplesize, and the sample size needed gets larger when the number of influencing factorsanalyzed gets larger. Secondly, the dependent variables of Logistic regressions currentlyused in SRS dabases are mainly binary variables. In these researches, the ADRs except thetarget ADR were considered as control group; patient age, patient gender, combinedmedications, complications were took as influencing factors. While, random forest has norequirement of sample size, it’s able to deal with the datasets which has more variablesthan cases. The response variable of random forest can reach up to32classes, thus, moreADRs can be added into analysis. Also, random forest can estimate missing valueseffectively, and is suitable for SRS datasets which often contain missing values.The aim of this research is to apply a new method, random forest, for influencingfactor analysis according to the limitations of current methods mentioned above. The realdata was used to prove the feasibility of random forest in SRS datasets, and analyze it’sadvantages compared to Logistic regression.Materials and MethodsThe cytarabine event took place in Shanghai and some other places of China from Jul.to Sep.2007was set as an example of this research.94individual reports were collected from Shanghai centre for ADR surveillance SRS databases.Random forest is composed bylots of decision trees. It’s used for prediction and classification. It has been used widely forpredict risk factors and analyze interactions in medicine, biology, physics and so on.Random forest was used to analyze the dataset and identify important influencingfactors. One of improved random forest was used to control the bias caused by relatedvariables. The results calculated by random forest were compared with the real event forthe popuse of validating the feseability of random forest in SRS datasets.ResultsThe main influencing factors of cytrarabine event were the company (company H),the route of administration (intrathecal route), the season when the event took place (fromJuly to September), relatively longer time to onset (more than7days). According to theresults of random forest and improved random forest, the “companyâ€,“route ofadministrationâ€,“seasonâ€,“time to onset†are important influencing factors which gethigher variable importance scores. And most factors of “cytarabine-muscularweaknessâ€and “cytarabine-paraplegia†have higher variable importance scores than“cytarabine-rash†and “cytarabine-pyrexiaâ€. So, muscular weakness and paraplegia aremore related to those influencing factors, and this indicates that muscular weakness andparaplegia are suspected adverse events. Improved random forest can reduce the potentialbias induced by the correlation between dosage form and route of administration.The cytarabine event was caused by impurity of two batches of cytarabine injectionduring the manufacture process. These injections caused nervous system disorders such asmuscular weakness and paraplegia via intrathecal route. The results of random forest andimproved random forest were consistent with the real event.ConclusionsThe results of random forest show that it can identify important influencing factorsfrom complex datasets and estimate their influences to the occurrence of adverse events.Also, random forest can reduce the bias from confounding factors and related factors to getresults which are more precisely. So, random forest may be an effective tool forinfluencing factors analysis in ADR SRS databases, it may help for identifying risk factors,detecting ADR signals, aeeseeing causal relationships and instructing clinical medication. |