In recent years,the number of biomedical literature has increased exponentially,and the field of cancer research has accumulated a large number of biomedical literature.However,the progress of cancer research is still very slow,and the scale of literature data is very large,which brings difficulties to cancer researchers.And there is a lack of effective communication and information integration between the different areas of cancer research.In the era of machine learning,the topic model has been acknowledged by both academia and industry.The topic model can extract the semantic information as the topics in a large number of texts.Compared with the semantic mining methods based on ontology and other knowledge bases,the semantic content produced by the topic model is more informative and valuable for the cross domain knowledge discovery.In this paper,we apply topic model and cluster analysis to integrate information and knowledge by proposing two kinds of biomedical literature knowledge discovery methods based on topic extraction and topic clustering.In the topic extraction method,we extract topics from the literature of 5 types of cancer including breast cancer,lung cancer,colon cancer,pancreatic cancer and prostate cancer from 2005 to 2014.We dig out the evolution of the common topics,construct the topic framework for each cancer research.The epidemic trend and topic relevance of cancer research were analyzed by topic fusion.In the topic clustering method,we improve the calculating method of the topic similarity,and carry on the density peak clustering analysis and the affinity propagation clustering analysis to the 6 cancers in the past 10 years including breast cancer,lung cancer,colon cancer,prostate cancer,urinary bladder cancer and non-hodgkin lymphoma.Taking breast cancer as an example,we analyze the content of the topic center,and summarize the development process of the topics of breast cancer,and find that the topic center is positively correlated with the number of publications.Addtionally,the oxaliplatin topic center of lung cancer as an example shows that medicine topics can provide great inspiration for the pharmaceutical industry.The validity and reliability of the method were verified by the analysis of the clinical cancer advances annual report.Breast cancer topic centers are preliminary predicted by combining the topic framework and the topic centers.Finally,the relationship between cancer and topics is further analyzed by means of the visualization as the cancer topics chord graph. |