Font Size: a A A

The Research And Design Of Network Information Monitoring And Analysis System

Posted on:2009-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2178360272980742Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with www technology's getting increasingly mature and Internet application's becoming more and more popular, Internet has become a huge distributed information space. Accessing to Web has become an important channel to get information for enterprises as well as individuals. However, it is difficult to handle such a huge amount of information by the traditional manual collection and processing methods. Besides, Web is mainly composed of HTML scripts, which are not structural, and Web itself features disorder of hyperlinks, masses of contents, diversity and dynamic. Automated search technology, though greatly enhanced the efficiency and speed of searching, still has the problem of being lack of efficiency (how to filter irrelevant information) and a low hit rate (how to reduce useful information lost) when used in specific field.This paper, following the features of most widely used HTML pages at present, studied the existing technologies such as automatic information collection, pretreatment, and automatic classification of network information processing. A specific network information monitoring system for automobile information field is designed and developed, aiming to overcome those insufficient of present information searching technologies. It can automatically search and collect useful information from several specific automobile related website. The system is tested and used by an automobile information consulting company, from which very positive feedback has been received.This paper focuses on the design and implementation of network information collection subsystem and intelligent analysis pre-classify subsystem, which are responsible for network information collection, preprocess and automatic classification, and are the core parts of the system. Several technologies are introduced and highlighted, such as non-recursive method with multithreading , parallel technology, by which efficiency and speed is improved effectively. exchange mode in parallel crawling is used in the implementation of parallel technology,which resolved the repeated crawling and missed crawling problems effectively. URL filter technology is introduced in web page collection process, and threshold method is used in web page classification process, greatly improved the effectiveness of information.The test group of an automobile consulting company has tested this network information monitoring system. The availability and effectiveness have been verified. Good results are also achieved in practical use along with customs'satisfaction.
Keywords/Search Tags:internet information, monitoring, analyze, webpage collection, webpage cleaning, webpage classification
PDF Full Text Request
Related items