Font Size: a A A

Research On Key Techniques Of Biological Information Data Mining Based On PPI Network

Posted on:2016-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M ZhaFull Text:PDF
GTID:1310330470465795Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Protein, as the product of the gene expression and the important material basis of life activity, almost participate in all life activities and biological processes. Previous studies have found that most of the proteins can not independently perform biological functions but through interaction with other proteins to exert their biological functions in the form of collaboration. At present, along with the steady accumulation and improvement of protein interaction data, the protein interaction network in complex biological networks gradually becomes one of the focuses of systems biology research. The paper carried on in-depth studies based on the protein interaction network, such as the mining of protein complex, the identification of essential proteins, and the prediction of disease genes. The specific studies are as follows:1. On the basis of the new findings of the internal structure of core-attachments in protein complexes and the high co-expression between core proteins, an algorithm of mining protein complex based on gene co-expression is proposed. In the first place, we use the data of gene expression to construct a weighted protein interaction network in accordance with character between genes which encode the proteins with interaction. Afterwards, from the angle of edge, the largely weighted edge is selected as a seed to identify the core protein in protein complexes. Eventually, protein complexes are generated by identifying the attachment protein for each core of protein complexes.2. According to the deficiency of the existing essential protein identification algorithms, the paper proposes a new algorithm to identify essential proteins based on locally connected strength. This new algorithm makes use of the fact that essential proteins usually correspond to the proteins with high degree in the protein interaction network. Starting from the source nodes to approach the core nodes in the network according to the local connectivity, essential proteins with high degree can be identified in the network. In the next step, essential proteins in the sparse area of the protein interaction network are identified based on the local centricity of the protein nodes in the network. The algorithm can not only identify degree proteins in the dense areas in the protein network, but also can identify degree proteins in the sparse areas, effectively making up for the measure parameters oneness deficiency of essential proteins.3. The studies found that the essential proteins often gather in protein complexes or function modules. The paper also analyses statistically the data set of standard protein complexes, and the statistical result shows that essential proteins exist in more than 60% protein complexes. On the basis of the above finding and the internal structure of core-attachment in protein complexes, a new algorithm to identify protein complexes based on essential proteins is proposed as well. First of all, set the essential protein nodes as the centre to extendedly identify core proteins in accordance with the first-order connecting strength. Then, protein complexes are generated by identifying the attachment protein for each core of protein complexes in accordance with the second-order connecting strength. Experiment results show that the algorithm proposed in this thesis can effectively dig the protein complexes in protein network.4. The studies also found that proteins encoded by the same or similar disease genes tend to gather together in the protein interaction network. On the basis of this finding, the paper proposes an algorithm of identifying disease genes based on function flow, which can identify disease genes through the protein interaction network and the corresponding relation between genes and proteins. In the thesis, the function similarity between genes is estimated by using gene ontology in the first place. Then, a weighted human protein interaction network is constructed, and the known disease genes and the candidate genes in the interrelated area are mapped to the protein network. Afterwards, use the known disease genes as the source points to simulate the function flow algorithm and calculate the function scores of each protein getting from disease genes. In the end, rank the candidate genes in the interrelated area according to the function scores, and genes that have higher prioritization rank are considered as more potential disease genes.In conclusion, the thesis conducts a study of the practical application of protein interaction network. By utilizing protein interaction, gene expression and gene ontology, two protein complex mining algorithms are designed from the angle of edge and point respectively; a essential protein identification algorithm based on connection strength and a disease gene prediction algorithm based on function flow are designed as well. Experiments and analyses are conducted with real data sets, and the result proves that the algorithms proposed in the thesis are quite effective.
Keywords/Search Tags:gene co-expression, protein complex, function module, essential protein, complex disease, disease gene
PDF Full Text Request
Related items