Font Size: a A A

Automatic Tag Construction And Intelligent Repositories Search For Open Source Software Communities

Posted on:2018-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CaiFull Text:PDF
GTID:2428330596490051Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the large-scale development of software and Internet industry,a variety of open-source software repositories hosting communities came into being.These communities not only provide software developers code management tools,but also provide users open source repositories search service.However,as large amount of unstructured description information exists in most open source software repositories,it always takes developers more effort to filter open source resources.Secondly,most of the current open source community search engines only consider the relevance between search query and repository's name or descriptive text,except for the actual function,application environment of the software repository,known as software characteristics.Users are often unable to search satisfactory results when taking implicit software characteristics as search keywords.For the above problems,this paper presents an approach that automatically constructing tags for open source software communities.This approach analyzes the data of different websites in the software engineering domain and takes a label propagation process on the constructed graph model by using machine learning methods.As tag is a kind of metadata for concrete object used widely in websites and communities,it can help users quickly understand the characteristics of software repositories.Thus,this paper also provides a kind of software characteristics supported search approach for open source software repositories by using the constructed tags.The contribution of this paper mainly includes:1)This paper proposes an approach for automatically constructing tags for crosscommunities in software engineering.By observing the heterogeneity and similarity of data from different community sites in the software engineering field,this approach analyzes several relevance features of the heterogeneous entities and quantifies the heterogeneous data objects.An entity relation graph model is built and a semi-supervised machine learning method is applied,to complete the label propagation across communities.2)This paper proposes a semantic extension approach for search queries.This approach identifies domain keywords in the original search text by using web information,knowledge bases in specific and general domain.The domainspecific knowledge base is used to match the search keywords on concept nodes,and to extend semantic synonyms,hypernyms and hyponyms,in order to generate a set including semantically related terms.3)This paper proposes a search approach based on the fusion of manual model and machine learning model.As the search objects are software repositories,a variety of relevancy features including repositories' text corpus,software characteristics are presented,in which software characteristics are expressed by domain tags.This approach takes a manual relevance model to filter resource and applies a ranking model to rank repositories in candidate set.It ensures a certain quality of the search case by manual model and adopts the user feedback on the actual search results.The study collects experimental data from two websites in software engineering domain,Stack Overflow and Git Hub.We construct semantic tags for tens of thousands of software repositories in Git Hub.The F1-Measure of our method is about 11.52% higher than that of the suboptimal method on average,and can reach about 28.02% under optimal conditions.Compared with the traditional methods,our method outperforms them on prediction accuracy and label richness.Compared with the same domain or general search engine,our method shows a good performance in the search results and also meets the performance requirements.
Keywords/Search Tags:Tag Construction, Software Repository, Repository Search, Software Characteristics, Open-Source Software Community
PDF Full Text Request
Related items