Font Size: a A A

Algorithms For Mining Statistically Significant Communities

Posted on:2020-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y F SuFull Text:PDF
GTID:2370330590496791Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Community detection is an important research topic in network science.Its goal is to find a subset of vertices connected with each other closely in a network.So far,a variety of community detection algorithms from different perspectives have been proposed one by one.However,most of them ignore the statistical significance of the community.The problem whether communities can be obtained with the help of statistical significance,and whether statistical significance can become a criterion of evaluating the quality of protein complex has not yet been solved.In order to mine statistically significant communities,two methods for calculating p-value are proposed in this article.One is to calculate the p-value of the community directly and the other is to calculate the p-value of a single vertex.This paper proposes a method for calculating p-value of a community based on hypothesis testing.The null hypothesis is that there is no difference between the number of edges in a given community and the number of edges in a random null model.And the p-value is the proportion that communities in a random null model is denser than given community where the vertices remain unchanged.The calculation of p-value can be an objective function in communities mining process.The method can be used to evaluate protein complex.Experiments shows this method is feasible for mining statistically significant communities and its performance is outstanding for the evaluation of protein complex.The calculation method for the p-value of single vertex is proposed based on Gehan's Generalized Wilcoxon Test,which is a hypothesis testing.The null hypothesis is that there is no difference in the degree sequence of a vertex between in and out of current community.The p-value is one-sided probability because the test statistic obeys standard normal distribution.This paper designs an algorithm for mining statistically significant communities based on FDR.The algorithm is a process which many communities are mined.In an iteration of one community,with beginning of choosing a seed node and the construction of initial community,then the community is updated by the result of p-value calculation until it is convergent.The mining process is terminated when no seed vertex is available.The experiment shows the algorithm is quite competitive for statistically significant communities mining.
Keywords/Search Tags:Statistically significant community mining, p-value calculation, Hypothesis testing, False discovery rate
PDF Full Text Request
Related items