Font Size: a A A

Clustering Algorithm Based On Semantic Similarity Analysis Of The Soft Component

Posted on:2011-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y P RenFull Text:PDF
GTID:2208360308971798Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of software technology and the rise of reuse concept, software reuse has been a research hot of software engineering. As an effective way, software reuse can improve the efficiency of software development and the quality of software. In order to really realize the systematization and engineering of software reuse, a comprehensive and efficient component library system is needed to establish so as to manage components effectively. The reasonable classification representation of components is the foundation and precondition of the highly efficient component retrieval, which and component retrieval are two core problems in the component management. In this paper, on the basis of the faceted classification representation of components, the cluster analysis technology and semantic analysis technology are used to realize the more objective component classification.As a current common used method, the faced classification representation method has some shortcomings, which are the term space depending on the expertise, the stronger subjectivity and so on. Aiming at these shortcomings, the faceted classification representation combined with the full-text retrieval is used to describe the components, and a component-clustering algorithm based on Latent Semantic Analysis Model is proposed to classify the components. This algorithm can realize component clustering at certain semantic degree, and can overcome the high-dimensional and sparse problem of Vector Space Model. The experiments prove that this component-clustering algorithm effectively improves the clustering quality, obtains more reasonable component classification, and supports the component retrieval more powerfully.In order to further improve the effect of component clustering, a new component clustering algorithm based on semantic similarity and optimization is proposed based on the semantic analysis technology in the natural language processing and the optimization strategy of Genetic Algorithm. By the use of the semantic analysis technology, this algorithm can reduce the subjectivity and obtain more objective component classification. And meanwhile, on the basis of the optimization strategy, this algorithm can increase the compactness and coupled factor within class, increase the quality of component clustering, and obtain more reasonable component classification. In the process of component clustering,in order to get better weight of the feature word and aiming at the shortcoming of traditional TF-IDF method which supposes that the feature words are independent and non-linear between each other, an improved TF-IDF method combined with semantics is proposed. Then this TF-IDF method is applied to the component-clustering algorithm based on similarity and optimization, which obtains better component classification. And separately compared with the effect of component clustering based on Vector Space Model and Latent Semantic Analysis Model, our component clustering algorithm is proved that it further improves the effect of component clustering, realizes the component classification more objective and more reasonable, provides better support for component retrieval, and reaches the purpose of reducing the cost of software reuse and promoting software reuse.As a core problem of the component library, component classification has been researched widely by the yield of software engineering. In this paper, the cluster analysis technology in data mining has been used to classify the components automatically, and meanwhile, the semantic analysis technology and the optimization strategy of Genetic Algorithm have been adopted to classify the components more objectively and more reasonably. But currently, the research on component clustering is much less, and the semantic analysis technology is still in the research phase. Therefore, the component clustering analysis from the angle of semantics has much space for further study and development.
Keywords/Search Tags:Component Clustering, Faced Classification, Latent Semantic Analysis, TF-IDF, Semantic Similarity, Clustering Optimization
PDF Full Text Request
Related items