Font Size: a A A

Incremental Web Data Mining System Based On Classification Tree

Posted on:2014-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2268330422463466Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the increasing popularity ofinformation and resources on the Internet are geometrically explosive growing. Facedwith such a huge amount of resources, how to effectively use the public resources onInternet is indeed a problem. The most Web information and resources exist in theform is HTML document. The nature of HTML document determine that theseinformation and resources can not be used directly, then how effective mining dataand resources on the Internet is the focus of this project to solve the problem.This thesis studies how to efficiently mine resources in the target Web site, thensaved to a structured database.This Web information mining system contains thefollowing three important parts: classification tree mining, resource list mining,incremental mining judgments.First mine the classification structure of the Web site, then administrator doclassification mapping viaing the management system, manage the classificationwhich administrator wanted to be diged. The classification tree mining is thebackbone of the entire system, subsequent functions in accordance with this trunk.Mining of the resource list, obtaine entry address by the classification tree, thenobtain the list of resources under the classification by page.Incremental mining judging, loop to parse each resource from the list ofresources, then obtain updating time and ID of each resource,then throughIncremental algorithms to determine whether the resource is new or updating, if it is,continue to dig the resource details page.In the system implementation, with specific examples, a lot of tests, based on theclassification tree from a film resource website, deep, incremental mining of the filmand video resources on the target site, test result is good, the efficiency is very high.
Keywords/Search Tags:data mining, classification tree, incremental mining, plug-in mode
PDF Full Text Request
Related items