Font Size: a A A

Probabilistic Graphical Model Based Code Topic Mining

Posted on:2016-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:K JiangFull Text:PDF
GTID:2308330476453329Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Getting insight of the source code of one project is always a horrible job. Some large open source projects like Gcc and Linux have thousands of ?les, it is hard to get what are these codes about in a short time. The automated analysis of such repositories is important, for instance, to understand software structure, function, complexity, and evolution, as well as to identify relationships between humans and the software they produce.Unsupervised learning like traditional LDA can explore topics of corpus, but the data contained in repositories are not “plain”, a code ?le consists codes which may be decided by programming languages and natural language comments which is writed by the programmers. It can not tell us what the topics exactly are. If we use other data to train the taxonomy, the distributions behand them are not same according to the special style of code ?les.In this paper, we propose a new probabilistic graphical model to explore the topic of repositories. This model does not regards code ?les of repository as plain texts, but a structural data source. Different parts of the repository like identi?ers, comments header ?les body ?les and commits are united together. This model also simulates the generating process of the repositories which are more close to the natural process than traditional topic models. In addition, commits are also additional data that can improve the performance of the model.However, topic models can just show us the words distributions. This is not straitforward enough. In this case, we built a computing hierarchical taxonomy based on the Stack over?ow tagging system and developed different methods to make it a hierarchical classi?cation system.
Keywords/Search Tags:Code Topic Mining, EM Inference, Taxonomy Construction
PDF Full Text Request
Related items