Font Size: a A A

Identification Of Carcinogenic Chemicals Based On Network Embedding And Text Correlation

Posted on:2022-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:X F PengFull Text:PDF
GTID:2504306728986539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cancer is the second leading cause of human death in the world.So far,many factors have been proved to be the cause of cancer.Among them,carcinogenic chemical has been widely recognized as the important one.However,the traditional methods for identifying carcinogenic chemicals perform not well enough and their efficiency is low.Therefore,designing an effective calculation method with a wide application is urgent.In this study,some new computational models were proposed for identifying carcinogenic chemicals.The main contents of this paper are as follows:First,a model of identifying carcinogenic chemicals based on text correlation is proposed.The carcinogenic chemicals and non carcinogenic chemicals obtained from the Carcinogenic Potency Database(CPDB)are taken as an original data set.Through the text mining method,the correlation between them is extracted from the relevant literature,and the model based on text correlation is established.The F-measure and AUC of the model are 0.749 and 0.738 respectively.Then,the advantages of the model are analyzed by comparing with the model using classical chemical coding scheme and other correlation-based models.The results show that the proposed model is better than these models.Second,a model for identifying carcinogenic chemicals based on network embedding algorithm is proposed.We use the same data set in the text correlation-based model.All chemicals are represented by features derived from a chemical network via a network embedding method,mashup.The obtained features are fed into the random forest to build a network-based random forest model.The F-measure and AUC of such model are 0.739 and0.755 respectively.Furthermore,it is compared with the model using classical chemical coding scheme and correlation-based models.The results show that the model is superior to these models.Finally,the last method identifies carcinogenic chemicals using Random Walk with Restart(RWR)algorithm.The "Combined_score" between chemicals are used to construct the network.RWR algorithm with carcinogenic chemicals as seed nodes is applied on the above network to discover novel carcinogenic chemicals.The Leave-One-Out results indicate that the AUC of the model is only about 0.5,meaning that the application of RWR algorithm in identifying carcinogenic chemical still needs further improvement.
Keywords/Search Tags:Carcinogenicity, Carcinogenic chemical, Machine learning, Network embedding method, Random forest, Text mining, Restart the random walk algorithm
PDF Full Text Request
Related items