Web crawler indexing: An approach by clustering

Posted on:2005-07-29

Degree:M.S

Type:Thesis

University:University of Nevada, Reno

Candidate:Menon, Dhanya C

Full Text:PDF

GTID:2458390008489867

Subject:Computer Science

Abstract/Summary:

Data Mining is a class of database applications that looks for hidden patterns in a group of data that can be used for future behavior. Knowledge discovery in databases (KDD) is a process of extracting unknown, potential information from data and thus uses the raw results of data mining for transforming it to understandable information. With the growth of online data on web, the opportunity and necessity to implement data mining techniques for effective web information retrieval has arisen. Web crawlers, also known as agents, robots or spiders are programs that continuously work behind the scene, having the essential role of downloading information from the web and maintaining an index of the downloaded pages.;With the enormous growth of web sites/pages, the problem of indexing poses to be a big bottleneck for effective querying and searching information. The thesis topic outlines the algorithm and techniques required to build an efficient web crawler, named TechSpider which implements hierarchical clustering method as a step towards indexing the downloaded information. This improves the efficiency of searching and querying the index for a particular keyword. The thesis focuses on the algorithm implementing the hierarchical clustering based on keywords.

Keywords/Search Tags:

Web, Data, Indexing

Related items

1	Distance-based indexing: Observations, applications, and improvements
2	Research And Implementation On Indexing Mechanism For The Ocean Data Organization
3	Reference directed indexing: Indexing scientific literature in the context of its use
4	Research On P2P Network Based Vector Gegraphic Data Organization And Indexing Technogoy
5	Indexing problems in spatiotemporal databases
6	Research On Hash Indexing Technique Of High-dimensional Data
7	Web crawler indexing: An approach by clustering
8	Research And Implementation Of Multi-dimensional Data Index Structure For Meteorological Field
9	Study On The Theory & Practice Of Automatic Indexing Of WWW Science And Technology Information Resources
10	Research On Indexing Heterogenous Data Method Based On Dataspaces