Font Size: a A A

Correctness Analysis And Network I/O Reduction For Data-parallel Programs

Posted on:2015-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:T XiaoFull Text:PDF
GTID:1228330452969327Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Thedata-parallelprogrammingmodelhasgaineditspopularityinbigdataprocessingdue to its simplicity. There is however a significant gulf between the sequential thinkingin writing user-defined functions and their parallel execution, which makes user easy tooverlook the nondeterminism and network I/O cost in the parallelism, and introduces cor-rectness and performance issues. I therefore focus on these problems, identify flaws orlimitations in the existing solutions, and propose improved approaches. My main contri-butions include:(1) To tackle the correctness issue caused by nondeterminism, I study507distinctuser-defined reducers found from tens of thousands of production data-parallel programs.My extensive study reveals many non-commutative reducers without any correctness is-sue, which negates the old belief that non-commutative reducers are always bugs. Fur-thermore, I find five non-commutativity patterns and certain implicit data properties thatreducers of some patterns rely on to guarantee determinism. I successfully identify fiverealbugsthathavegoneundetectedforlong, anddiscusspossiblewaystoimprovetestingof data-parallel programs.(2) I propose Cybertron to reduce network I/O for data-parallel programs. While thecurrent state-of-the-art techniques use static program analysis to reduce I/O, Cybertronproposes a new direction that incorporates runtime mechanisms to push the limit furtheron I/O reduction. In particular, Cybertron tracks how data is used in the computationaccurately at runtime to filter unused data at finer granularity dynamically, beyond whatcurrent static-analysis based mechanisms are capable of, and to facilitate a new mecha-nism called constraint based encoding for more efficient encoding. Cybertron has beenimplemented and applied to production data-parallel programs; our extensive evaluationson real programs and real data have shown its effectiveness on I/O reduction over theexisting mechanisms at reasonable CPU cost, and its improvement on end-to-end perfor-mance in various network environments.
Keywords/Search Tags:Data-parallel program, correctness, performance optimization, network I/O
PDF Full Text Request
Related items