Early Software Effort Estimation Supported By Semantic Analysis Of Requirement Documents

Posted on:2018-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:S S Tong

Full Text:PDF

GTID:2428330596490043

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Nowadays,as the development of the computer science and software engineering,more and more software projects are proposed.But almost 70%of software projects are challenged or failed.The most important reason is that people always make mistakes in effort estimation.Software effort estimation is divided into early phase estimation and middle phase estimation.Early phase estimation is more difficult than middle phase estimation because requirement and design are not clear in this phase,and it has significant influence to the whole software project.To enhance precision and convenience of early phase effort estimation,this paper proposes an effort estimation method based on semantic analysis of requirement documents.First,it makes chunk-level semantic analysis and word-level semantic analysis to requirement documents;extracts local and global features from documents to represent to complexity of the requirement item.Then it uses the historical data and regression algorithm to train the size estimation model and effort estimation model,estimates the size and the effort of the new project.To solve shortage of historical data,this paper uses cross-organization data to train model.It uses normalization,canonical correlation analysis and restricted Boltzmann machines to preprocess the data.The main contributions and innovations in this paper are:1)Research and propose the framework of universal early phase effort estimation.The framework combines natural language processing technology and machine learning technology.And the framework can apply to software requirement documents written in different kinds of natural language.2)Research and propose a two-level semantic analysis method for requirement documents.The first level is chunk-level,and it implements by automation entity extraction.The method assigns semantic label for each requirement item.The second level is word-level,and it implements by propagation-based method.The method considers synonym and part-of relation between words in requirements.3)Research and propose a method to improve the accuracy of effort estimation in iterative developed project.For software project developed by iteration,this research finds that the influence of previous iteration data is higher than the influence of other projects data.This paper uses transfer learning to combine previous data and other projects'data to estimate project size,and enhance the precision.4)Research and propose a method of solving heterogeneous cross-company data in early phase effort estimation.The method uses canonical correlation analysis and restricted Boltzmann machines technology to enhance the precision of effort estimation;and it also combines CCA and RBM to further improve the accuracy.At last,to validate the effectiveness of the method of two-level semantic analysis on requirement documents and the method of heterogeneous cross-organization early phase effort estimation,this paper performs a series of experiments using the data of 39 industrial project from 5 companies and 5 common used datasets from PROMISE data repertory.The results show that chunk-level semantic analysis method is better than BR and SVR method for precision.The recall and F-score are increased more than 0.02;word-level semantic analysis method is better than BOW,LSA and LDA method for MMRE is decreased by 0.03,VAR is remained the same,PRED(25)is increased by 0.05.Cross-organization heterogeneous data solving method is better than KNN for MMRE is decreased by 0.5,PRED(25)is increased by 0.12,and MdMRE is decreased by 0.12.

Keywords/Search Tags:

Semantic Analysis of requirement Documents, Early Software Effort Estimation, Heterogeneous Data Processing, Transfer Learning, Machine Learning

PDF Full Text Request

Related items

1	Research On Data Drought Key Techniques For Software Effort Data Based On Machine Learning
2	An Experience-Based Model For Test Execution Effort Estimation
3	Transfer Learning Across Heterogeneous Feature Spaces
4	Research Of Machine Learning Algorithms On Heterogeneous Data
5	Heterogeneous Transfer Learning Between Image And Text Data
6	FPA-Based Software Effort Estimation Research And Practice
7	Research On Multi-domain Oriented Heterogeneous Relational Data Transfer Learning Bounds
8	Research On Financial Early Warning Of Listed Companies Based On Machine Learning
9	Learning With Unlabeled Data Based Research On Software Quality Assurance
10	The Design And Implementation Of Early-warning System Based On Machine Learning