Font Size: a A A

Research On The Frequent Itemsets Mining And The Scheduling Strategy Under Cloud Environment

Posted on:2014-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2268330425456194Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The cloud computing provides people with fast, safe and convenient data storage and diversified network services. The big data computing as well as varieties of customized service capabilities can be obtained by people from various data terminal through the internet at any position. For various reasons, the cloud computing technology has been a hot topic among domestic and foreign scholars.It is an important application field of cloud computing technology to mine information in which the user is interested from large-scale data. While the data transmission between each node has become one of the performance bottlenecks of scientific computing of large-scale data in cloud environment. It is proposed to reduce and even avoid the data transmission and meanwhile to mine some deterministic and efficient information in this paper at first. How to manage system resources effectively is an important indicator to measure the performance of a system in the large-scale data environment. The advance reservation scheduling mechanism with a deadline has increased the predictability of the system resources. However, with the increasing scale of the tasks, the generated resource fragmentation will lead to reduction in both the whole system performance and task hit rate of the system. In consideration of the resource fragmentation of the system, careful studies are conducted to reduce the influence of the system performance in this paper. The business characteristics of cloud computing must satisfy user’s feeling of high QoS. Due to a large number of shared data, it should be taken into consideration the high-performance computing ability and the cost for the user during the implement process of data-intensive workflow application in the cloud environment. In view of the maximum cost-benefit and high quality of service for the user, several ideas are present in this paper. The main contents are as follow:Firstly, in order to avoid the dependent date transmission during parallel mining frequent itemsets in the cloud environment, an improved algorithm is proposed in this paper, which is called A Parallel Frequent Itemsets Mining Algorithm Based on Binary Coding and Clustering. The dependent relation of the compute nodes can be reduced by encoding the clustering. The data transmission of the nodes can be avoided thoroughly by way of a shared multi-head table, which will increase the implement efficiency of the parallel frequent mining as well as produce small amount of expansion frequent items. However, more excellent implement performances can be obtained for any type of the data than the existing parallel frequent itemset mining algoriths is proved by the experiments.Secondly, taking full advantage of the resource fragmentation generated by scheduling task with a deadline in advance reservation, computational geometry is used to map the system resource. It is proposed to build some balanced search trees with a special structure by splitting the plane levelly. Compared to the single-tree structure, the proposed method can reduce the time cost of the information update. The evaluation indicator of the fragmentation influence is present to consider the impact of the system performance, which is caused by the length and time of the fragmentation. By selecting the optimal fragmentation resources to schedule, the utilization and task hit rates of the system can be obtained more efficiently than existing resource scheduling strategy.Finally, in order to provide satisfying service for the user during the implement process of data-intensive workflow application in the cloud environment, an improved algorithm is proposed based on dependency degree partition. Firstly, it is advisable to group the dependency degree of each task and then give it a priority assignment in accordance with the initial parameters. Then, this paper optimizes the deployment of the scheduling for the priority tasks according to the relationship of the divided groups based on the proposed scheduling algorithm. By this measure, the user cost will be reduced greatly and the user can obtain the ideal acceleration radio of cost and service performance. Furthermore, more comprehensive resource service is provided by the service provider.
Keywords/Search Tags:cloud computing, frequent itemsets, resource scheduling, workflow scheduling, the quality of the service
PDF Full Text Request
Related items