Font Size: a A A

Web crawler indexing: An approach by clustering

Posted on:2005-07-29Degree:M.SType:Thesis
University:University of Nevada, RenoCandidate:Menon, Dhanya CFull Text:PDF
GTID:2458390008489867Subject:Computer Science
Abstract/Summary:
Data Mining is a class of database applications that looks for hidden patterns in a group of data that can be used for future behavior. Knowledge discovery in databases (KDD) is a process of extracting unknown, potential information from data and thus uses the raw results of data mining for transforming it to understandable information. With the growth of online data on web, the opportunity and necessity to implement data mining techniques for effective web information retrieval has arisen. Web crawlers, also known as agents, robots or spiders are programs that continuously work behind the scene, having the essential role of downloading information from the web and maintaining an index of the downloaded pages.;With the enormous growth of web sites/pages, the problem of indexing poses to be a big bottleneck for effective querying and searching information. The thesis topic outlines the algorithm and techniques required to build an efficient web crawler, named TechSpider which implements hierarchical clustering method as a step towards indexing the downloaded information. This improves the efficiency of searching and querying the index for a particular keyword. The thesis focuses on the algorithm implementing the hierarchical clustering based on keywords.
Keywords/Search Tags:Web, Data, Indexing
Related items