Feature-based analysis of open source using big data analytics

Posted on:2016-10-08

Degree:M.S

Type:Thesis

University:University of Missouri - Kansas City

Candidate:Krishnan, Malathy

Full Text:PDF

GTID:2478390017480797

Subject:Computer Science

Abstract/Summary:

The open source code base has increased enormously and hence understanding the functionality of the projects has become extremely difficult. The existing approaches of feature discovery that aim to identify functionality are typically semi-automatic and often require human intervention. In this thesis, an innovative framework is proposed for automatic discovery of features and the respective components for any open source project dynamically using Machine Learning. The overall goal of the approach is to create an automated and scalable model which produces accurate results.;The initial step is to extract the meta-data and perform pre-processing. The next step is to dynamically discover topics using Latent Dirichlet Allocation and to form components optimally using K-Means. The final step is to discover the features implemented in the components using Term Frequency - Inverse Document Frequency algorithm. This framework is implemented in Spark that is a fast and parallel processing engine for big data analytics. ArchStudio tool is used to visualize the features to class mapping functionality. As a case study, Apache Solr and Apache Hadoop HDFS are used to illustrate the automatic discovery of components and features. We demonstrated the scalabilty and the accuracy of our proposed model compared with a manual evaluation by software architecture experts as a baseline. The accuracy is 85% when compared with the manual evaluation of Apache Solr. In addition, many new features were discovered for both the case studies through the automated framework.

Keywords/Search Tags:

Open source, Using, Features

Related items

1	Research On The Factors Influencing The Survival Status And Survival Prediction Of Open-Source Projects
2	Research And Design On Open Source Community Data Mining Key Technologies
3	Design And Implementation Of Open Source License Automatic Analysis System
4	Research And Implementation Of Open Source Software Systems Situation Analysis
5	Research On Key Technologies Of Web Data Extraction And Mining On Open Source Community
6	On Intellectual Property Protection Of Open Source Software
7	Being open in a closed world: Essays on innovation in open source networks
8	An Approach Of Automatic Fork Summary Generation In Open Source Community Based On Feature Extraction
9	Research And Evaluation Of Open Source OPAC
10	Research On Relationship Between Code Quality And Software Defects For Open Source Software