Research On Technology Of Software Component Obtaining From The Internet

Posted on:2011-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:D K Xu

Full Text:PDF

GTID:2178360302999162

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, internet has become a platform for sharing resources.Most of the resources on the network display as Web form.So do the component library resources on the network. The research purpose of this paper is to find component resources from network, analyze web pages which may contain component resources, and extract component information to local disk.To achieve these objectives, the paper processes web data as follows:First, the paper describes component with BNF which needs crawling on the Web. Based on the description component storage model and baseline document generate. And they prepare for the subsequent chapters;Secondly,this paper identifies component topic of the web pages from internet with Bayesian TF-IDF algorithm in four aspects including webpage content, virtual text, title text and keywords text and makes a storage of the pages relevant to component subject;Thirdly, with combining crawling strategy of page rank and shark search this page sorts the URLs to be treated.With the comprehensive strategy crawler can crawl high priority URL first and avoid the theme of migration in crawling process;Fourth, based on relevance and the visual characteristics of the page block algorithm this page analyzes web page with component information and identifies the topic blocks from the web page;Fifth, this paper creates four matrixes from the adjacent constraint, feature constraints, location constraints and relevance between entities, and then clusters the entities with the improved transitive closure method.At the end of the chapter based on the baseline document and the storage model this paper matches the clustered entities to attributes of the component storage model and generates XML document to store the extracted component information;This paper implements the technology exploration of obtaining components from the Internet. In the summary of each chapter this paper also presents summaries of the four technologies to be further improved.The summaries are directions needing to continue research.

Keywords/Search Tags:

Component Obtaining, Component Description Model, Topic Page-Recognition, Comprehensive Crawling Strategy, Fuzzy Clustering

PDF Full Text Request

Related items

1	Key Technology Research On Web Forums Crawling And Hot Topic Detection
2	Study On Focused Crawling Technique For Vertical Search Engine
3	Research On Topic Web Page Crawling Strategy For Vertical Search Engine
4	Component Facet Description And Retrieval Research Based On Component Credibility
5	Research On Component Description And Retrieval Of Component Based On Quality Evaluation
6	The Component Library Management Model And System Implementation For ICEMDA
7	Software Component Description Based On Ontology
8	Research About Key Problems Of Component Based Software Development
9	Research On Component Based Development And Its Supporting Tools
10	The Design And Realization Of Real-time System Component Library