Spermatogenesis plays a vital role in producing normal sperms of men, soresearches on this field can not only deepen our understanding of this bioprocess, butalso help to develop cure methods for male infertility. In this thesis, we aimed to collectand analysize literature and experimental data existing in database from a perspective ofdata mining and system biology, hoping to help experimental researchers work better inspermatogenesis reaserch. Firstly, based on text mining technology, we processed agreat quantity of abstracts downloaded from PubMed, and developed an effectivestrategy to automaticly extact names of spermatogenesis functional genes from theseabstacts. Then, based on the data collected from literature and existing database, wedesigned a mathmatic model to predict novel functional genes in spermatogenesis.Finally, intergating information from our clloction, predicton and other relateddatabases, we constructed a new professional database which contained detailinformation of spermatogenesis related genes. Our work and results are described indetail as following:(1) A lot of important information in spermatogenesis can not get fully utilized,because these information is buried in a large number of research papers. Therefore, inorder to obtain the buried knowledge, we proposed a strategy to extract names offunctional genes in spermatogenesis from abstracts,based on text mining technology. Inthe first step, we manually curated related abstracts and collected training datasets; then,we classified rest abstracts into three categories by SVM classifier and selected highclassification confidence abstracts for gene name extraction. In this step, we appliedbio-entity recongnition and negation recongnition method and used “co-occurenceâ€rules to filter extracted gene names. Validation results suggest our strategy has goodperformance, extraction accuracy gets71.9%compared with manual extraction results.(2) Analyzing18spermatogenesis-related microarray chip datasets downloadedfrom ArrayExpress database, we extracted87-dimensional featuresto predict novelfunctonal genes in spermtogenesis. The positive dataset derived from text mining andmanual curation, negative dataset was collected fromMGI database. Then, we designedGAS algorithm to predict potential functional genes in spermatogenesis, a total of762genes were predicted, which had probability values greater than0.5. After analysis ofknown and predicted genes,we foundthat they had similar gene functions, stronlysuggesting predicted genes were involved in spermatogenesis,(3) Based on LAMP technology, we constucted a new spermatogenesis functionalgene database named SpermatogenesisOnline1.0, which covered genes from37species,1666genes were obtained from text mining and manual curation,762genes were fromGAS algorithm prediction. Besides, we annoted these genes in detail intergating geneinformation from other databases. In order to help users get access to our database, we developed web page, which provided search-service and met user’s different needs. Ourwebsit can be accessed at: http://mcg.ustc.edu.cn/sdap1/spermgenes/index.php。... |