Research And Implementation For Web Spider Based On Web Data Mining

Posted on:2008-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:J J Zhan

Full Text:PDF

GTID:2178360242979318

Subject:Computer software and theory

Abstract/Summary:

The spider programming technology is the key part of search engine, which is the convenient and effective method to get the information from the WWW. Surrounding the innovative technology of Web Data Mining and based on the whole request of search engine's frame, the main work of this article is to realize the cruise of the Internet spider,and store the data of the page into the local database, place a firm foundation for the realization of intelligent search engine.The main contents of this article include:Firstly, analyze the principle of search engines and realize the first step in the work of search engine: get the page data from Internet. Secondly, describes the technology used in the article,such as HTTP protocol, Regular Expressions, Multi-thread and ADO.NET. Based on the network spider technique, the article analyzes and designs a system of a new spider. Using the BFS strategy ,Combined with multi-threads technology , this article realizes the algorithms of crawling the web-pages from Internal and External networks and analyzing the content .In this paper, the innovation lies, first, regular expression technology applications to getting WEB content to make extracting the website URL quickly and efficiently and achieving crawls the internal networks and the web-pages content and analysis algorithms. Finally compress data with Zlib algorithm and put the data into the local database. Secondly, in order to increase the speed, we adopt a special strategy to deal with the wrong URL. That is, through the server's response time to deciding whether or not to get the HTTP pages, then put the overtime URL in the wrong queue waiting for the process of the thread of dealing with wrong URL. Thirdly, after analyzing the result of experiment in the network of campus and the result of the data stored in the database, the feasibility of the spider can be validated,the prospective object of the system have been achieved.Finally,the conclusion of the whole system and the future work of the subject are presented.

Keywords/Search Tags:

web mining, network spider, search engine

Related items

1	Professional Search Engine Research And Design
2	The Research Of The Personalized Search Engine
3	Design And Implementation Of A Spider For Topic-Specific Search Engine
4	Research And Achievement Of The Search Strategic For The Topic Search Engine Spider
5	Design And Implementation Of Search Engine Based On The Web Data Mining
6	The Theme Of The Search Engine Web Spider Search Strategy Study
7	Web Spider Design And Realization Of Intelligent Search Engine
8	Network-based Professional Search Engine Spiders Search Strategy
9	The Research And Implementation On The Spider Of The Vertical Search Engines Based On The Reinforcement Learning
10	Design And Realize Of Spider In Vertical Search Engine