Font Size: a A A

Data Mining Application In Software Knowledge Bases

Posted on:2011-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2178360305477847Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of informationization, computer database systems have been in all areas of society. The result of explosive growth of data of database. The face of massive data storage, the computer of limited capacity has to discard a lot of data. However, these discarded data which contain abundant knowledge are not to be used effectively. Therefore, in the face of the problem, Data mining technology emerges as the times require and is abroad applied on every field. It is an interdisciplinary, intergrated database, artificial intelligence, machine learning, statistics and discipline of other fields. Meanwhile, various tools for data mining have been emerging.With the development of computer software and development technology, the size and complexity of software system increases rapidly. It is difficult to control and manage software development activities. In the field of software engineering, traditional methods and simple mathematical statistics are difficult to resolve the explosive growth of software data. Therefore, mining useful information from software data which produced in a software lifecycle (such as design documentations, source codes, code repositories and configuration files and so on) to guide software development and maintenance is very important in software engineering. Some experts and scholars attempt to use data mining technology finding the potential knowledge. The useful information would help programmer in several aspects: help to understand software architecture and function call dependency, help to discover bug and latent bug which not incur in the current system, help to modularize software, help to reengineer legacy system, and help to improve the stability and reliability of software system. Software engineering mining have attracted many attentions from researchers in recent years and some excellent data mining methods were proposed by them in each stage of software lifecycle. The paper mainly research two categories data of software knowledge of software development, testing and maintenance process: the data of software historical version (stored in SVN) and Bug report data (stored in Bugzilla). The paper includes five aspects study as following: 1. Summary data mining and software engineeringOverview of the origin,basic task,classical algorithm and application direction of data mining. Then, we introduce the software engineering, especially a large software system projects. Discuss how to apply data mining techniques in software Repository data (software requirement, design documentation, development, testing, maintenance and so on), Meanwhile, we also proposal some challenging problems in the future.2. Extract data from software repositoryCurrently, the most popular version control tools are SVN or CVS. Software engineering project will generate a lot of source code files,design documents,historical records which records the all historical update data of the evolution of software system lifecycle. It is difficult for software developers and maintainers to start without enough experience. Data collection is a foundation work of data mining. We design an extractor (Java-RESC) to collection the available data set of XML format from software repository (SVN or CVS).3. Mining software repository application in software developmentWith the development of information technology and computer software, software system has become larger than ever before and tremendous source codes result in development and metenance. The size and complexity of software system consume the high costs of developer to understand software system. Meanwhile, developers are often faced with modification tasks that changing to software module always involved in changing other modules. Dependencies between source codes are difficult to be identified by traditional static and dynamic analysis. In this paper, we use data mining techniques to discover function call dependence graph from software version histories. Then, we add source code annotations to function call dependence graph to help software developers to understand software architechture and source code modification.4. Mining software repository application in software maintenanceSoftware maintenance is the most important stage of software lifecycle, but also the longest stage. Defects are a critical factor of causing software system instability. To ensure that computer software system can operate stability, maintainers must fixed bug timely. So, software maintenance has become the main way to extend sofrware lifecycle. However, with the more complex of computer hardware and software, and the lack of a variety of documents, software maintenance task will become more difficulty. Especially, the degree of difficulty cann't be imagined when software developer and maintainers isn't the same person. In order to reduce the workload of software maintainers to know the prevent potential defects bug and fixed-induce defects in the future. In this paper, we associate the source code modify-entity of version control system (SVN) with the fixed bug of bug tracking system (Bugzilla) of the process of software project development. This paper tries to use data mining techniques to classify defects of Bug reports: defects and potential defects, and the potential defects as a key maintenance task. Maintainers can fix bug timely, to avoid the costs increase of maintenance in the future.5. We summarize and describe the inadequacies and future works in the final chapter.
Keywords/Search Tags:Data Mining, Software Engineering, Software Knowledge Bases, Software Developers, Software Maintenance
PDF Full Text Request
Related items