An Algorithm Research On Component Description Extraction Internetâ€based Component Library System

Posted on:2013-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Zhou

Full Text:PDF

GTID:2248330392457828

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Component based software development method is considered as a realistic way tosolve the software crisis, premise of this approach is the need to use a large number ofmeta-components, to address the sources of components, current internet emergent a numberof component download site resources, however, the overall dispersion of these elementsresources, it causes great inconvenience to access component accurately andcomprehensively, it can extract components description from these components resourcesites, and re-organization and integration, then provide a unified portal for access to thosecomponents, this will have a great practical significance.Building component library system internet-based involves three key technologies,1)Theme crawler technology, mainly collected web document from the Internet which containinformation of components;2) Information extraction technology, getting componentdescription automatically from the web page, and put it into semantics clearer, morestructured format;3) component retrieval, of the information collected to the mark, mining,rational organization and storage, then provides a portal for users to retrieve. This studyfocuses on how to automatically obtain components description from the web page. It is anexus of research, it need to analyze rough, mixed and chaotic web page, and extract aneffective component description up, the next it needs to be organized in a reasonable mannerfor the follow-up component mining and retrieval.Some of the current Web information extraction algorithm consists of html-basedstructure, wrapper induction, Web page semantic analysis into three categories, Because ofthe lack of these algorithms for web page, this paper propose a topic-based similarityapproximation algorithm for component information extraction, By introducing a semanticdictionary of extraction model, masked the difference where different component librarywebsite describes the component, while taking advantage of component descriptions areoften gathered under a specific node label, been computing each tagâ€™s subject similarity until reach the maximum similarity, then achieve the purpose of precise positioning andinformation extracting. Large number of experiments show that the algorithm has a highextraction rate of the premise, greatly reduced the degree of human intervention, while has agood ability to adapt the dynamic change of website structure.

Keywords/Search Tags:

component, information extraction, extraction algorithm, component mining, component retrieval

PDF Full Text Request

Related items

1	Component Library Component Retrieval Theory
2	Component Retrieval Based On Ontology
3	Component Description Method And Retrieval Strategies In Program Mining
4	Research On Component Classification And Selection Method Based On Group Intelligence Algorithm
5	Research On The Adaptation Method Of Component Based On The Multidimensional Mapping Retrieval
6	The Research Of The Key Methods Of The RTCOM-based Component Repository Management
7	Research On Algorithms For Component Extraction Using Existing Codes
8	Research On Ant Colony Algorithm Based Component Retrieval Method
9	On Image Retrieval Based On Unsupervised Component Analysis
10	Research On Driver Component Of Embedded Linux System

An Algorithm Research On Component Description Extraction Internetâ€based Component Library System

An Algorithm Research On Component Description Extraction Internetâ€based Component Library System