| With the development of information technology,followed by vast amounts of data,these data has the characteristics of large amount of data types,rich,which also contains a lot of potential valuable information.How to dig out the potential unknown information from these data has been the focus of attentions,link prediction as one of the important methods in data mining field,so pay close attention to by the people.But with the increase of the amount of data,the traditional link prediction methods in solving large-scale data mining,tend to be high time complexity and large computational complexity problem of restriction,which can't be effectively applied to large-scale graph data,This paper from two aspects to solve the problem,one is reduces the time complexity of the problem by sampling can reduce the computational complexity,but the quality of the sampling results closely related to forecasting results,and two is parallels computing based on the Saprk GraphX.But how to guarantee the quality of sampling is one of the most focus on the core of the problem in the process of sampling;In addition,the presence of the parallel framework provides a possible,effectively dealing with large-scale data about link prediction problem also can improve computing speed by parallel computing framework.First,this article is based on graph sketches technology,extending the existing link prediction method,based on All Distances Sketches(ADS)technology of link prediction method was proposed,in combination with the existing prediction methods,we defines the link prediction methods based on the structure of ADS,while calculating the similarities between two nodes is how to reduce the time complexity,at the same time gives the link prediction and the concrete algorithm of link prediction based on ADS technology;Parallel computing framework for the second aspect,as a result of the Spark GraphX in graph computing have obvious advantages,so we designed and implemented ADS parallel algorithm in this article based on the Spark GraphX development platform,at the same time,we designed the parallel algorithm of link prediction method,and the parallel algorithm of link prediction method based on ADS technology,through comparing the experimental results for the parallel algorithm,and finally from the algorithm running time and the accuracy of the prediction results analysis verification link the validity of the method is based on the technology of graph sketches,the experimental results show that the link prediction algorithm based on ADS technology can guarantee a certain prediction accuracy,at the same time reduce the algorithm's time complexity,improve operation efficiency. |