Font Size: a A A

Program Synthesis for Empowering End Users and Stress-Testing Compilers

Posted on:2016-05-19Degree:Ph.DType:Thesis
University:University of California, DavisCandidate:Le, Vu MinhFull Text:PDF
GTID:2478390017976155Subject:Computer Science
Abstract/Summary:
Since 2012, the number of people with access to computing devices has exploded. This trend is due to these devices' falling prices, and their increased functionalities and programmability. Two challenges arise from this trend: (1) How to assist the increasing number of device owners, most of whom are non-programmers, yet hope to take full advantage of their devices, and (2) how to make critical software, which is relied upon by other software running on these devices, more reliable? Our vision is that program synthesis is the solution for the above arising challenges. This dissertation describes various program synthesis techniques to synthesize programs from natural languages, input/output examples, and existing programs to tackle these challenges.;We present SmartSynth, a natural language interface that synthesizes smartphone programs from natural language descriptions. SmartSynth enables users to automate their repetitive tasks by leveraging various phone sensors. It is the first system that combines the advances in both natural language processing (NLP) and program synthesis: it uses NLP techniques to parse a given command, and applies program synthesis techniques to resolve parsing ambiguities. We have adapted and extended SmartSynth's algorithms to integrate it into TouchDevelop, a popular touch-based programming environments for end users.;We develop FlashExtract, a system that extracts data from semi-structured document (such as text files, webpages, and spreadsheets) using examples. Natural language does not work well in this domain because the tasks are complicated and users usually do not know how to perform them. In FlashExtract, users only need to highlight some sample regions to be extracted, the system will learn a program to select similar regions automatically. They can also provide nested examples to extract structured, hierarchical data. FlashExtract has been shipped as the cmdlet ConvertFrom-String of PowerShell in Microsoft Windows 10 and as the Custom Fields feature in Microsoft Azure Operational Insights.;While it is important to make programming accessible to end users, it is also vital to improve the correctness of critical software because they impact other software products. We introduce Equivalence Modulo Inputs (EMI), a novel, general methodology to synthesize valid compiler test programs from existing test cases. Given a test program and a set of its inputs, we profile the program's execution over the inputs. We then generate new test variants by randomly removing code that is unexecuted under the provided inputs. The expectation is that these new variants should behave exactly the same as the original program under the same inputs; any observed discrepancy indicates a compiler bug. Our technique is simple to realize yet very effective. In total, we have reported more than 400 bugs in GCC and LLVM, most of which have already been fixed. Many compiler vendors have adopted EMI to test their compilers.;We also believe that cross-disciplinary solutions can increase the impact of program synthesis techniques. We therefore further explore program synthesis in the following three dimensions. First, we introduce two novel user interaction models to help users resolve ambiguities in programming-by-example systems like FlashExtract. One model allows users to effectively navigate among the large set of programs consistent with the provided examples. The other model uses active learning to ask questions to help differentiate the outputs of these programs. Second, we present a guided, advanced mutation strategy based on Bayesian optimization to synthesize EMI variants more effectively. Our improved technique supports both code deletions and insertions in the unexecuted regions, and uses Markov Chain Monte Carlo (MCMC) optimization to guide the synthesis process. Finally, we apply program synthesis to find bugs in a new domain: the link-time optimizer components in compilers.
Keywords/Search Tags:Program synthesis, Users, Compiler, Test, Natural language
Related items