| Complex networks are composed of a series of community structures with certain relevance and independence.Mining community structures has wide application value in fields such as social networks.Among the relevant algorithms for community discovery,spectral clustering algorithms transform the final problem of the algorithm into solving the eigenvalues and eigenvectors of the matrix,thereby achieving a satisfactory partitioning effect while reducing computational complexity,but unable to effectively utilize redundant or high-dimensional data;The Louvain community discovery algorithm is based on modularity and is a greedy algorithm.However,in the actual network partitioning process,it ignores the close attribute relationships between nodes,and there are redundant nodes and edges in the network.Based on this,this thesis proposes a spectral clustering algorithm based on user characteristics and a Louvain algorithm based on leaf nodes.The main work and innovation points are as follows:(1)A spectral clustering algorithm based on user characteristics is proposed.Traditional spectral clustering cannot calculate high-dimensional data in social networks,and it takes too long to calculate large-scale data during the clustering process.Therefore,this article introduces cosine similarity to reduce the dimensionality of high-dimensional data,and updates the similarity matrix in traditional spectral clustering to improve the accuracy of dividing communities between users.At the same time,it uses the Mini Batch K-Means algorithm to replace the K-means algorithm in the traditional algorithm clustering process,It not only ensures the quality of community division,improving the operational efficiency of the algorithm.Finally,comparative experiments were conducted on Weibo datasets of different scales on Spark platform,and verified using the Davies-Bouldin index and time index.The experimental results show that the improved algorithm has a good Davies-Bouldin index and a relatively small time index,which can effectively solve the problems mentioned in the spectral clustering algorithm.(2)A Louvain algorithm based on leaf nodes is proposed.For the Louvain algorithm,which ignores the attribute relationship between nodes,the improved Louvain algorithm redistributes the weight between users in the network by recalculating the similarity between users.At the same time,nodes without user characteristics and nodes with only output are filtered as leaf nodes,thereby optimizing the initialization process of the algorithm and reducing the redundancy of dataset initialization.After the algorithm runs,dividing leaf nodes directly into the community where their neighbor nodes reside improves the efficiency of the algorithm and ensures the quality of community division.The final experiment was conducted on a Weibo dataset.The experiment used Spark’s GraphX component to reduce programming difficulty,achieve parallelism to a certain extent,and improve the running speed of the algorithm.By comparing the modularity and time index,the experimental results showed that the improved Louvain algorithm has a relatively good modularity index,and consumes less time,which can improve the efficiency of the algorithm. |