Font Size: a A A

Automatic Builds of Large Software Repositories

Posted on:2016-09-14Degree:M.SType:Thesis
University:University of California, IrvineCandidate:Achar, RohanFull Text:PDF
GTID:2478390017981731Subject:Information Science
Abstract/Summary:
A large number of open source projects are hosted on the Internet by popular repository sites like GitHub, SourceForge, BitBucket, etc. These repositories are becoming more popular and growing in size. There are many research projects that mine these software repositories for valuable information.;Compiling the projects found in these repositories can help filter out the good, and usable projects. It gives us a guarantee that the source code is syntactically correct, and that all the dependencies of the project are either self contained or accessible on the Internet. Projects can be maintained and organized in different ways depending on the developer culture and practice. Unfortunately, very often repositories fail to capture the environmental assumptions made by the developers such as build tools, versions, presence of external dependencies, etc. This heterogeneous nature of the projects makes the successful compilation of large numbers of projects a challenging task as one solution cannot be applied to all. It is impractical to manually correct the compilation of every project. We designed several heuristics to maximize the number of projects compiling successfully in a repository.;Sourcerer is an infrastructure for large-scale collection and analysis of open-source code. It crawls open-source Java projects from various sources on the Internet and builds an aggregated repository, and database. We used the information found in the database to automatically compile more than 55,000 Java projects in the Sourcerer repository. We propose several general, language independent heuristics to tackle the most common errors. Using these heuristics, we were capable of building 33.18% of the projects in the repository, successfully building more than 18,000 Java projects.
Keywords/Search Tags:Projects, Repository, Large, Repositories
Related items