Font Size: a A A

A Research On Fuzzy Hierarchical Clustering Algorithm And Its Application In Software Re-architecting

Posted on:2018-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2348330536973579Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The big data era,which bring more knowledge for us.As one of the knowledge discovery tool,data mining technology has been the basic theory that scholars rely on.Hierarchical clustering is an important part of data mining,therefore,In recent years,hierarchical clustering has been applied in many different study fields.With the rapid development of computer technology,as a major member of the software system,Legacy systems are increasingly difficult to apply to complex software functional requirements.Whether to choose the legacy systems or not that has become a headache for enterprises.More than this,due to the long development time of legacy system,the loss of development documents,and system structure disorder,those problems become a stumbling block to the late development of legacy systems.Refactoring software architecture for legacy systems is the main way to solve the above problems.At present,there are two kinds of methods for software re-architecting:(1)Mathematical modeling method.(2)Clustering method.There are two kinds of methods for software re-architecting based on Clustering: software re-architecting based on density clustering and software re-architecting based on hierarchical clustering.In recent years,the application of hierarchical clustering algorithm in the software re-architecting has become the main research method for the legacy system re-architecting.However,there are still some disadvantages in this method:(1)The two relation between entity and entity cannot distinguish the influence of entity feature on entity.(2)Hierarchical clustering based on distance computation has poor clustering effect.In view of the above problems,this paper improves the traditional hierarchical clustering,further proposes a new hierarchical clustering algorithm,which is called Fuzzy Hierarchical Clustering algorithm Based on Information Loss(FHCBIL).The algorithm has been improved in the method of entity characteristic weight assignment and similarity calculation.This solves the problem that the traditional hierarchical clustering algorithm does not have a good effect on the clustering of the irregular data sets and the low classification accuracy among the entities.In this paper,the FHCBIL algorithm is applied in software re-architecting.Finally,it generates a new software architecture.The main contribution of the paper is as follows:(1)Built a software architecture model based on fuzzy hierarchical clustering.The accuracy of entity partitioning based on hierarchical clustering is not high.To solve the problem,in this paper,a model of software re-architecting based on fuzzy hierarchical clustering is constructed by combining fuzzy relation and hierarchical clustering.In this paper,fuzzy hierarchical clustering is applied in software re-architecting.Furthermore,this paper builds a software re-architecting system model based on fuzzy hierarchical clustering.The system model consists of three modules: data extraction,fuzzy hierarchical clustering and evaluation.The data extraction module of the software architecture realizes the data function from the source code of the software system.The fuzzy hierarchical clustering module realizes the generation of new software architecture.The evaluation module realizes the rationality evaluation of software architecture and the evaluation of entity partition correctness.(2)Given an improved fuzzy hierarchical clustering algorithm FHCBIL.There are two deficiencies in the traditional hierarchical clustering algorithm: Firstly,the relationship between the cluster entity and the entity characteristic is a two element relation,so it cannot distinguish the influence of the entity characteristics on the entity clustering.Secondly,the distance based similarity calculation method makes the classification accuracy lower.In view of the above two problems,this paper formulates the corresponding strategy and gives an improved fuzzy hierarchical clustering algorithm FHCBIL.There are two ways to improve the algorithm: Firstly,this algorithm extends the weight assignment of entity characteristics.In this algorithm,the weight of the entity characteristic is divided into the global weight and the local weight,so that the entity characteristics belongs to the entity with different degrees of membership.In this way,the fuzzy relation exists between the entity and the entity characteristic.Secondly,the information loss is used as the similarity calculation method of FHCBIL algorithm.Based on the above strategy,this paper gives an improved fuzzy hierarchical clustering algorithm FHCBIL.The algorithm is divided into three parts: data preprocessing,FHCBIL clustering,tree structure.In the part of data preprocessing,firstly,the Z-score method is used to standardize the data set.Secondly,the feature weights are assigned to construct the entity feature vectors.In the clustering part,firstly,each data object is selected as the cluster center.Secondly,the similarity between entities is calculated by the method of information loss.Finally,the most similar entity is merged to form a new cluster.In part of the tree structure,firstly,the entity feature vectors,the number of clusters and the number of clusters are updated.Secondly,Iterative merge entity when the number of clusters and the number of layers to reach a given threshold.Finally form a tree structure.(3)Realized the software architecture reconstruction system based on FHCBIL algorithm.In this paper,the object-oriented legacy system is selected.Then the selected legacy system source code conversion and data extraction.Finally,the extracted data set is used to cluster by FHCBIL algorithm to generate the tree structure,and to realize legacy system re-architecting.In order to verify the performance of the improved fuzzy hierarchical clustering algorithm FHCBIL and the quality of the new architecture based on FHCBIL algorithm,this paper makes a corresponding assessment.In the clustering performance evaluation of FHCBIL algorithm,the JC coefficient(JC),Fowlkes Mallows Index(FMI)and Rand Index(RI)were chosen as the evaluation indexes in the experiment.By using the data of the selected legacy system,we compare the results with the classical clustering algorithm.The experimental results show that FHCBIL algorithm has good clustering performance.In the evaluation of the quality of new architecture based on FHCBIL algorithm,several commonly used hierarchical clustering algorithms for software re-architecting and FHCBIL algorithm are used to reconstruct the software architecture.The experimental results show that the new architecture based on FHCBIL algorithm is reasonable.The new architecture has high cohesion and low coupling.And the classification accuracy is high.
Keywords/Search Tags:fuzzy theory, hierarchical clustering, FHCBIL algorithm, Legacy system, reverse engineering
PDF Full Text Request
Related items