Font Size: a A A

Efficient Retrieval Method Suitable For Large-scale Component Library Research

Posted on:2013-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2218330374963604Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the critical factor of software reuse, the component-based technology isvery important in software development. The application of component-basedtechnique can substantially improve the productivity and reliability of softwaredevelopment, and cut down the cost of software development. The idea ofsoftware reuse has also been well represented in some applications of softwareservices such as Web Services and Active Services. Higher requirements forservice efficiency are propounded by the development of the service-orientedtechniques. Repositories depend on the capability of the componentmanagement and retrieval system to provide the needed components for theservices quickly and accurately. With scaling up of the component repository,the requirements of the dynamic-demand services cannot be met efficiently bythe traditional components retrieval approaches. How to make componentsretrieval maintaining a high recall and precision ratio while the retrieval timedecreasing substantially, it is still an open question that should be solvedquickly.By analyzing the advantages of components faceted classification and thefull-text retrieval, a component retrieval approach based on functional invertedindex and full-text retrieval of description document is proposed in this paper.Firstly, the inverted index of functional facet is used to exclude the irrelevantcomponents in function, and then the improved VSM algorithm is used tocalculate the similarity between the retrieval key words and the componentdocuments. It can effectively overcome the subjective effects of facetedclassification and help to improve the precision of components retrieval byretrieval in component description document, and with the advantage of the highefficiency of index and the algorithm of improved VSM similarity, the recallratio is increased while the retrieval time is decreased significantly.To further enhance the retrieval efficiency in large-scale componentrepository, a novel approach of components retrieval, the Automatic TagsExtraction (ATE) retrieval, is proposed in this paper. In this method, the component tags are extracted automatically from application domain terms,high-frequency terms, high-weight terms and facet terms in described documentof component at first, and then the Vector Space Mode is used to retrieve on thetags. To improve the retrieval speed, a combined index which consists of anInverted Index of Functional Facet (IIFF) and a Ranked Index of ComponentTags (RICT) is designed in this method. And the VSM similarity algorithm isfurther improved based on the component tags to enhance the retrieval precision.ATE retrieval with better retrieval effectiveness and high flexibility, has thebetter adaptability for the expanding of component repository.Finally, compared with four common retrieval methods through severalexperiments, the feasibility and efficiency of retrieval approaches proposed inthis paper is validated. In order to validate the adaptability of approachesproposed in this paper in large-scale component repository, some simulatedrepositories are built using random selecting algorithm. The retrieval time, thepre-processing time of ATE and the space cost of ATE are analyzed throughretrieving in different size repositories. The experiment results show that, theapproaches proposed in this paper having better effectiveness of componentretrieval and lower time-space cost is very suitable for the applications oflarge-scale component repository retrieval.
Keywords/Search Tags:Software Components, Components Retrieval, FacetedClassification, Inverted Index, ATE, VSM
PDF Full Text Request
Related items