Font Size: a A A

Optimizing Multithreaded Applications For Competitive Multicore Environments

Posted on:2017-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q PengFull Text:PDF
GTID:1318330503458160Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the prevalence of multicore processors, programming models have shifted from traditional sequential pattern to thread-level parallel pattern, in order to fully take advantages of powerful computing resources. Because multicore machines have abundant computing resources and memory capacity, to maximize resource utilization, it is common to consolidate multiple multithreaded applications or virtual machines hosting multithreaded applications on a single multicore machine, resulting in contention for CPU resources among multithreaded applications. For example, research from VMware in 2010 shows that each core is time-shared by four virtual CPUs(VCPU). In such competitive multicore environments, useless threads(such as busy waiting threads in synchronization operations) probably waste CPU resources that should have been used by useful threads in multithreaded applications, so as to significantly degrade the performance of multithreaded applications. Therefore, it is important to efficiently utilize CPU resources for guaranteeing performance of multithreaded applications in competitive multicore environments.Firstly, lots of multithreaded applications are implemented in the Single Program Multiple Data(SPMD) programming model that has a pattern of computation phases and communication with barrier synchronization. Therefore, the performance of multithreaded applications highly depends on barrier latency. However, the barrier latency could be significantly extended in multiprogrammed environments. The reason is that schedulers of most mainstream operating systems are unaware of synchronization operations within multithreaded applications and laggard threads are not timely scheduled. During the barrier latency, spinning–waiting threads probably waste computing resources and relinquish cores to other co-running applications after they are blocked. This may significantly aggravate both the system throughput and fairness. To deal with these issues, a time donating barrier strategy named Tidon donates the timeslices of waiting threads to their preempted, laggard siblings(specifying threads from the same application). In this way, waiting threads can directly contribute to the completion of barriers. The experimental results demonstrate that Tidon can improve the performance degradation of barrier-intensive applications by up to a factor of 17.9 while not hurting or even improving the performance of non-barrier-intensive applications, resulting in good fairness among co-running applications. In addition, Tidon also has the effectiveness in virtualized envrionments.Then, it is an important method to balance workload among threads for guaranteeing the efficiency of multithreaded applications. This is because a multithreaded application can not end until its last thread ends. Work-stealing is widely used to achieve dynamic load balancing of multithreaded applications. However, previous research works have demonstrated that work-stealing suffers from inefficiency in competitive environments, such as traditional multiprogrammed environments and virtualized environments. The reason is that the unsuccessful steals performed by thieves probably waste CPU resources. Although there are some works focusing on enhancing the efficiency of work-stealing in traditional multiprogrammed environments, none of them have shown their effectiveness in virtualized environments. To deal with this issue, a scheduling framework named Robinhood uses thieves to accelerate useful threads at both the guest operating system level and virtual machine monitor level. Compared to Cilk++ and BWS, Robinhood can improve the performance of work-stealing applications by up to 90% and 72%, respectively.Finally, data-parallel applications are common in cloud computing environments. Data-parallel applications have been shown to be a promising alternative to harness aboundant resources in multicore platforms. MapReduce is initially implemented on clusters, and it has also been demonstrated to be an effective programming model on a single multicore machine. Meanwhile, with the increasing number of virtual machines on a single multicore in cloud environments, CPU resources become increasingly valuable. However, existing MapReduce systems for a single multicore machine, focus on generalization while lacking of aggressive optimization, resulting in unnecessary overheads and thus wasting CPU resources in some cases. To deal this issue, a customizable MapReduce system based on application characteristics named Peacock develops an application characterization model, and classifies multithreaded applications based on MapReduce into four classes, and customizes efficient execution flows for different classes of applications, so as to minimize unnecessary overheads in applications. The experimental results demonstrate that Peacock achieves better performance than Phoenix++ for workloads that inherently have only one emitted value per key by up to a speedup of 3.6.In summary, based on the exploration for enhancing the resource efficiency of multithreaded applications from different views, a series of methods are proposed to optimize multithreaded applications in competitive multicore environments, so as to guarantee the performance of multithreaded applications in competitive multicore environments.
Keywords/Search Tags:Multithreaded Applications, Concurrency, Virtualization, Application Characterization, Synchronization, Work-stealing
PDF Full Text Request
Related items