Font Size: a A A

Research On Load Allocation Strategy Based On Data Clustering

Posted on:2019-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:M TianFull Text:PDF
GTID:2428330566994469Subject:Computer Science and Technology Software Engineering
Abstract/Summary:PDF Full Text Request
With the Internet and self-media vigorous development,the network has become the main way for people to communicate with each other and obtain useful information.The amount of information shows an exponential and explosive growth in the network.In order to ensure the normal service and efficient access to network information,managers need to perform excavation processing on the huge user browsing information to ensure that the resource manager can reasonably allocate the load of the cluster service nodes.Otherwise,the processing efficiency and the reduction of the nodes will be reduced.The effective duration of the node.This document studies the load distribution method of nodes in the cluster,collects the service web sites that provide network information,uses the Bloom filter to deduplicate them,and then uses the improved EK-Means algorithm to cluster the web site data;The combination of the importance of the PageRank algorithm compute node and the node's real-time processing capability sets the weight for the node(LDPR).It is used as the basis of node subtask assignment in the scheduling strategy,and brings the results of the research into the web site positioning application scenario.authenticating.The experimental results verify the use of LDPR values as a basis for load distribution and improve data processing efficiency.The main work of this article is as follows:(1)The clustering of data is studied.The error formula of Bloom filter is used to select the proper number of random Hash functions,and the error rate is reduced as much as possible.The improved EK-Means algorithm is used to collect data.The core idea of improvement is how to move the selection of the initial center closer to the cluster center to reduce the number of iterations and improve the clustering efficiency.(2)Introduce importance evaluation parameters of nodes,improve load redistribution strategy,define LDPR indicators based on PageRank algorithm,and propose a load distribution strategy based on LDPR.Through comparative experiments,the effectiveness of the LDPR algorithm is verified and the basis for the allocation of tasks in the scheduling algorithm is provided.(3)Experimental simulation: The application of the Bloom filter and the load balancing strategy made using the PageRank algorithm in the actual location of the website location is compared with other load distribution methods.In summary,this paper evaluates the load capacity of nodes in a cluster environment,and evaluates the load capacity of the nodes from the perspective of the importance of the nodes and the real-time processing capacity.Through experiments,the LDPR factor is used as a load.Factors improve data processing efficiency...
Keywords/Search Tags:Spark platform, load distribution, data clustering, K-Means algorithm, bloom filte
PDF Full Text Request
Related items