Font Size: a A A

The Study And Implementation Of Website Analysis System Based On Vertical Search Technology

Posted on:2009-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z HuFull Text:PDF
GTID:2178360245989628Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, Internet has been more and more widely used in all fields of economic and social life. As the Internet economy is developing swiftly, problems causing by websites outside the law and rules continue to show an upward tendency. Although the Ministry of Information Industry of China has developed a system, called the Registration and Record Management Information System for ICP/IP Address/Domain Name, to carry out some simple management functions, a lot of websites out of records cannot be detected in time under the existing technical means, and the supervision of standardized operation for the online websites is not effective enough, too. This has put forward a new demand for the Internet management of the new period.After a in-depth and careful studies and analysis on the current circumstances of website supervision, the subject presents a new way to implement the functions of management and statistics for the registrations and records of all websites in Sichuan Province: First, the DNS logs and IP addresses provided by ISPs are collected and cross-matched with the data coming from the system of the Ministry of Information Industry. After that, some core technologies of vertical search, including information acquisition and natural language parsing, are applied to provide dynamic real-time scanning, monitoring, controlling and deep digging of the website data. By this way, the Website Analysis System not only achieves an initiative and timely supervision and management of Internet, but increases the adaptability on facing various new technologies of anti-supervision as well.This thesis firstly introduces vertical search and its relative technologies, then proposes a framework and a technology line based on vertical search technologies for the Website Analysis System. After that, this thesis introduces the design of each sub-system in detail. The main part of this thesis bears down on studies and implementations on two of the core technologies of vertical search engine: web spider and Chinese word segmentation. The web spider used in this subject adopts database queue, multi-process & multi-thread and a technique for parsing web page elements step by step. The Chinese word segmentation combines two means of word segmentation: mechanical method & statistical method. A statistical Chinese word segmentation dictionary with word priorities is used to obtain the shortest segments, which comes out to be the optimal segmentation. The preliminary test has proved that the performance and accuracy of this vertical search engine could meet the demands of the system very well. The final part of this thesis introduces the whole design of security defense system of this subject on computer network, disaster backup & recovery, access control and management mechanisms.
Keywords/Search Tags:Vertical Search, Website Analysis, Website Supervision, Web Spider, Chinese Word Segmentation
PDF Full Text Request
Related items