Font Size: a A A

Research And Implementation Of Web Content Monitoring

Posted on:2013-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2248330374485710Subject:Information security
Abstract/Summary:PDF Full Text Request
Under the condition of current social and the increasingly complicated network environment, network information has had a great effect upon social stability and human life. In order to prevent the proliferation and browse of the illegal information, active supervision and protection of the network security become a significant topic of social stability. Thus, the active research of monitoring technology of the website can make a difference.This thesis dwells on the open source WEB crawlers, WEB information extraction and text mining technology, on the basis of which achieve a network content monitor system—WCMS (Web Content Monitor System). The active supervision of content of the website by WCMS, the analysis of the content of the website, the sort of hot news, the classification of content, topics will be shown. The supervisor can conveniently analyze the content of the website and find out the hot spot.WCMS mainly includes the research and implementation of three independent modules:1) WCMS spider and information extraction module:WCMS spider is based on the open source WEB crawlers to expand, accessing to WEB original data; secondly, by the research of WEB information extraction, it can realize the extraction of WEB original data text, title, comments and some key information, and classify the raw data and WEB information for saving, this step can facilitate the follow-up processing.2) WCMS information processing module:A database interface is implemented. Webpage text content can be operated through this interface. Based on the research of Text Mining, Webpage text information is preprocessed, classified, and clustered. The result will be stored in associated webpage item. Assigning different weight to webpage title, author, comments, classification result and clustering result, a module to estimate the hot of webpage content is designed and implemented. In order to notify the supervisor the hotspot, webpage contents are ordered by the popularity.3) WCMS monitor management module:Delimiting network early-warning independently. When the webpage content hotspot value exceeds the pre-prepared threshold, the WCMS presents the related webpage information to the supervisor. Main topics which the website monitoring pays close attention to can be checked. Browsing the web pages grouped by category is possible. By retrieving key words for the objective pages, the search results can be shown taxonomically.
Keywords/Search Tags:WEB content monitor, WCMS, information extraction, text classification, text clustering
PDF Full Text Request
Related items