Font Size: a A A

An Algorithm Research On Component Description Extraction Internet‐based Component Library System

Posted on:2013-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z P ZhouFull Text:PDF
GTID:2248330392457828Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Component based software development method is considered as a realistic way tosolve the software crisis, premise of this approach is the need to use a large number ofmeta-components, to address the sources of components, current internet emergent a numberof component download site resources, however, the overall dispersion of these elementsresources, it causes great inconvenience to access component accurately andcomprehensively, it can extract components description from these components resourcesites, and re-organization and integration, then provide a unified portal for access to thosecomponents, this will have a great practical significance.Building component library system internet-based involves three key technologies,1)Theme crawler technology, mainly collected web document from the Internet which containinformation of components;2) Information extraction technology, getting componentdescription automatically from the web page, and put it into semantics clearer, morestructured format;3) component retrieval, of the information collected to the mark, mining,rational organization and storage, then provides a portal for users to retrieve. This studyfocuses on how to automatically obtain components description from the web page. It is anexus of research, it need to analyze rough, mixed and chaotic web page, and extract aneffective component description up, the next it needs to be organized in a reasonable mannerfor the follow-up component mining and retrieval.Some of the current Web information extraction algorithm consists of html-basedstructure, wrapper induction, Web page semantic analysis into three categories, Because ofthe lack of these algorithms for web page, this paper propose a topic-based similarityapproximation algorithm for component information extraction, By introducing a semanticdictionary of extraction model, masked the difference where different component librarywebsite describes the component, while taking advantage of component descriptions areoften gathered under a specific node label, been computing each tag’s subject similarity until reach the maximum similarity, then achieve the purpose of precise positioning andinformation extracting. Large number of experiments show that the algorithm has a highextraction rate of the premise, greatly reduced the degree of human intervention, while has agood ability to adapt the dynamic change of website structure.
Keywords/Search Tags:component, information extraction, extraction algorithm, component mining, component retrieval
PDF Full Text Request
Related items