Font Size: a A A

On The Java Virtual Machine's Performance Analyses And Optimizations For Data-Instensive Applications

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LeiFull Text:PDF
GTID:2428330590991622Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Big Data era is coming.People in different fields are meeting the challenges of handling more data.In order to solve the problem,experts and researchers come up with many frameworks.The state-of-art frameworks(like Spark,Hadoop)generally use managed runtime languages like Scala and Java to program.However,managed runtime like Java Virtual Machine(JVM)introduces extra performance overhead,as to maintain data abstraction and memory management.Besides,the additional abstraction enlarges the semantic gap between Big Data application and the hardware.Thus,such applications are hardly to fully exploit hardware parallelism.Specifically,this work analyzes three problems in Big data application running on top of the Java Virtual Machine.1.To maintain additional abstractions(e.g.type-safety,automatic garbage collection),Java virtual machine uses the memory with extra overhead.In garbage collection,memory management requires extra read or write barriers,which have impacts on application running.2.As for both type-safety and garbage collection,a object header is coallocated with Java object,which is not present in traditional C/C++ object.While application is filling up with small object,the object headers,as a trivial part to the program running,may have non-trivial performance imapct.3.There are limitations in Just-In-Time compiler.The compiler cannot exploit specialized semantics in Big Data application,and cannot provide extra optimizations for the scenario.At the same time,in Spark,operation is based on the element.This pattern cannot scale efficiently to the scenario and enable compiler to exploit semantic.To solve above problems,this work proposes three solutions accordingly,1.Use hardware virtualization to remove barrier overhead.This work uses dirty bit in page table to replace dirty card in card table.By leveraging guest page table,page table can be used as the card table.While reading the card table,dirty bit in page table is read.Since the page table is maintained by the hardware,JVM doesn't require to generate barrier.2.Divided allocation object for efficient Java object implementation.This layout allocates object header and object data separately.Thus,object header would not have direct impact on data's read and write.This work propose three runtime designs for retrieving object data.Finaly,the highest performance one is selected to be implemented.By using this approach,object data access has no extra overhead.When application access the object data,the overall memory locality is improved.3.SuperVector,an optimized abstraction for Big data vector-based machine learning application.This data abstraction aggregates mamy vector in on structure,providing a coarse-grained SuperVector operation.Based on this abstraction,optimized interfaces can be proposed.In data-intensive scenario,there usually exists the pattern many to one.By using SuperVector,such semantic can be expoited,enabling optimizations in compiler.The work have implemented all three approaches in Java Virtual Machine and Big data framework and evaluated them.
Keywords/Search Tags:Java virtual machine, data-intensive application, Java object layout, garbage collection, data abstraction
PDF Full Text Request
Related items