Font Size: a A A

Design Of A Web Data Acquisition And Reduction System

Posted on:2011-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:J C TianFull Text:PDF
GTID:2178330332988247Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Our life is changing gradually by Internet. The network brings us convenience as well as many security issues require our attention. Intranet users access to the external network mostly through the Web browser. The web pages which users browse is need to be acquisition and reduction in real-time, in order to monitor the content users access in real-time and discover user access to illegal and unhealthy web page in time, and then summarize Intranet users'habit of accessing to the network. This is the purpose of the research.The acquisition of web data and reassemble of TCP session can be completed based on the deeply analysis of TCP/IP and HTTP protocols. The libpcap is used to construct the data packet capture program, and the transfer technology of the finite state machine in CLAY is used to filter and analyze the data packet. Then this system use MySQL to build a database for storing collected data. Finally, the data in database is abstracted and local file is created to store these data by Perl language. For the reduction of web data, the hyperlink to the data in local file is created, therefore, the localization of web connection and the reduction of web data can be achieved.The system is implemented in Linux operating system. The web pages which Intranet users access is captured by listening mode, and collected web data are restored. Therefore, the expected goal of our research is achieved.
Keywords/Search Tags:Protocol analysis, Content supervisory control, Web data acquisition, Web data reduction, HTTP
PDF Full Text Request
Related items