Facilitating internet-scale code retrieval

Posted on:2011-06-15

Degree:Ph.D

Type:Dissertation

University:University of California, Irvine

Candidate:Bajracharya, Sushil Krishna

Full Text:PDF

GTID:1448390002952244

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

Internet-Scale code retrieval deals with the representation, storage, and access of relevant source code from a large amount of source code available on the Internet. Internet-Scale code retrieval systems support common emerging practices among software developers related to finding and reusing source code. In this dissertation we focus on some system and domain-specific challenges of Internet-Scale code retrieval.;This dissertation starts with an in-depth study of how developers use Koders, a commercial code search engine. The results of this study highlight several problems that need to be tackled in a commercial code search engine. To build solutions for some of these problems we develop an infrastructure, Sourcerer, that includes models and tools for large-scale collection and analysis of open source code. The stored contents and set of programmable services in Sourcerer enable rapid development and evaluation of retrieval schemes and applications of code search. We demonstrate the feasibility of developing state-of-the-art Internet-Scale code retrieval techniques on top of Sourcerer by presenting the implementation and evaluation details of code-specific retrieval schemes and code search tools.;The central premise of this dissertation is that source code retrieval techniques that incorporate structural information extracted from source code can be more effective in retrieving relevant code entities. We support this premise by presenting three approaches that lever-age structural information in code search. First, we present structure-based techniques to improve ranking in retrieving implementations of commonly sought for programming features, where our best technique outperforms Google and Google Code Search. Second, we present Test-Driven Code Search (TDCS), an approach to finding reusable code fragments on the Internet, that uses structure-based code retrieval and dependency slicing -- a technique to automatically pull code dependencies. Evaluation of TDCS with 34 students shows that TDCS is the fastest approach to find reusable code fragments for 59% of the students, and faster than Google Code Search for 66% of the students. Finally, we present Structural Semantic Indexing, a technique to associate meaningful terms with source code entities that improves the performance of retrieving code fragments to be used as API usage examples.

Keywords/Search Tags:

Internet-scale code retrieval, Source code, Code search, Code fragments, Code entities, Information

PDF Full Text Request

Related items

1	A Code Description Semantics Vector Based Java Code Search
2	Research On Search Based Code Recommendation Techniques
3	Shifting the burden of code optimization to the code producer
4	Research And Application Of Binary Code Similarity Detection Technique Based On Code Embedding
5	Source Code Based Suspicious Code And Bad Programming Practice Detecting
6	Research On Code Search Technology Based On Features Of Code And Comment
7	Research And Implementation Of Commercial Anti-counterfeiting Based On Special-shaped QR Code
8	Research And Implementation Of Automatic Code Summarization And Retrieval Technology For Open Source Reuse
9	Source Certification System And Structure,
10	Research On Theory And Application Of Steganogralhy Based On Error-Correcting Code