| With the rapid development of biotechnology, biomedical literature resources are increasing exponentially. MEDLINE/PubMed, as the most authoritative biomedical and world’s recognized literature database, has included more than 20 million corresponding directories. It is really difficult to search out and follow corresponding domain information manually with such a large scale of literature resources. Therefore, the biomedical literatures text mining technology which can extracts knowledge from huge amounts of the literatures has gradually been in a hot research. In this paper, some key technologies in biomedical literature mining is studied, the main work is as follows:1ã€For the deficiencies of time consuming and incomplete in traditional scanning of the retrieved records for relevance manually, combining the classical machine learning model support vector machine and ontology (Gene Ontology and Disease Ontology),a improved SVM classification algorithm was presented especially in biomedical literature domain. Experiments show that this algorithm has a 77.23% F1 value.2ã€For the deficiencies of manually matching clinical complex phenotypes with candidate harmful genes in biomedical literature, an new algorithm was presented which can produce an candidate gene list base on Human Phenotype Ontology (HPO). Experiments show that this algorithm can process 216 samples in 5 minutes.3ã€Combining with the existing field of biomedical text mining tools and machine learning algorithms given above, a comprehensive integration system solution was presented. The integrated System can provide robustness, portability and stability base on J2EE architecture.In this paper, preliminary experiments and fundamental analyses was carried in the scanning of the retrieved records and relationships between clinical phenotypes and candidate genes. Exploring the potential knowledge and providing scientific support using the comprehensive integration system solution. |