| The prediction of human genetic disorder gene is a hot issue in bioinformatics currently. With the sequencing of human genome completed and the development of new generation sequencing technology, the amount of the data describing the network of gene and protein interaction is increasing rapidly. Analysis and prediction of disease causing gene by these data provides new methods for the deciphering of genetic foundation and molecules basis of diseases which have important practical significance for genomics and medicine.According to the incidence relation between clinical descriptions of the genetic disease and the network of protein-protein interactions, this paper obtained related data, did the text mining on the OMIM (Online Mendelian Inheritance in Man) database and calculate the overlaps between the phenotypes of disorders by the method on the vector space model, scored the interactions between two proteins by protein-protein interaction data and added the disease-protein interaction data to build a series of biology network and analyze the data of phenotypes and protein network. Based on the network data, the candidate genes were scores and sorted in order to predict the disease causing gene. This paper proposed two new network based prediction methods of human genetic disorder genes. Particularly, firstly, on the basis of the traditional two-dimension relative probability model, this paper proposed two new probability models based on the multiple dimension random variables, the central probability model and profile probability model. And then, it is posted the prediction model based on these three probability models which reflected the related regulation between the phenotype similarities and protein-protein interactions. Another method is the regression analysis prediction using the filter function in which we observed the network data and summarize the rules of the data and made the assumption. Then we sorted the candidate genes with the regression analysis and filter function model.The prediction model based on three probability models of the relative probability model, central probability model and profile probability model reflected the related regulation between the phenotype similarities and protein-protein interactions. Compared to the traditional model based on the single probability, this model better reflected the real feature of the biology system and had a stronger ability to predict the disease causing gene. The prediction model using the regression analysis and filter function could efficiently sort the wrong gene in the end of the queue raising the efficiency of the prediction. |