| The successful development of COVID-19 vaccine in China shows the importance of clinical trial data management.Various unplanned adverse physical signs of subjects in the whole clinical trial process are called adverse events.Obviously,the record of adverse events is related to the success or failure of the whole clinical trial,but there are many misreports,omissions and even concealment in the collection process.The drugs taken to alleviate the poor performance are called combination medication,and the clinical supervisor collects the combination medication records of the subjects and combines the pharmacological knowledge to carry out association matching.However,for large-scale clinical trials,the supervisor shows a lack of pharmacological knowledge,which greatly reduces the efficiency of clinical trial management.In order to effectively solve this problem,many scholars began to use data mining technology to analyze the relationship between adverse events and drug combination.However,most of the studies were based on the matching of the record time,but in fact,we could not distinguish the specific correspondence due to the simultaneous occurrence of records.To this end,this paper imitates the traceability principle of clinical supervisors and introduces drug combination instructions,uses text mining technology to achieve automatic acquisition of medical knowledge,and uses the results as a bridge to mine the association between adverse events and drug combination records.On this basis,this paper also applied the experimental results to the clinical trial data management platform to provide a new idea for solving the problem of adverse event verification.The main work is as follows:1.In order to ensure the authenticity and effectiveness of the experimental results,the actual records of adverse events and drug combination provided by the Adverse Event Reporting System of the US Food and Drug Administration were taken as the research object.At the same time,the principle of manual traceability by clinical supervisors combined with pharmacological knowledge was analyzed,and the corresponding combined drug instructions were obtained by using web crawler technology.In addition,due to the unstructured characteristics of drug instructions,the word embedding representation of drug instructions is realized in this paper by Word2 vec model.2.In this paper,based on the instructions word vector of the weighted average method to vectors of adverse events,and for the combined medical records,each category of drug number difference is very big,in this paper,based on the improvement of the TF-IDF long text vector representation method,this method can effectively solve the effect in view of the disequilibrium class data vectors,At the same time,the improved method is verified and analyzed through real data sets.3.In this paper,through the text similarity analysis of adverse events and drug combination records,it is found that there are a small number of key records in both occupy an important position,and the replaceability of drug combination records is the main reason for the complex correlation between the two.In addition,it is proved that the correlation result based on word vector is superior to the result based on time alone.Finally,the association rules obtained are applied to the developed clinical trial data management platform to help clinical supervisors check adverse events. |