Font Size: a A A

Design And Implementation Of Large Capacity Data Based On HTTP Protocol Collection And Analysis At High Speed System

Posted on:2013-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:J X XuFull Text:PDF
GTID:2248330371966375Subject:Electronic and communications
Abstract/Summary:PDF Full Text Request
Due to the increasingly rapid development of Internet technology and the rapid growth of the knowledge-intensive industry, the process of information of people’s online activities is accelerated. Based on the Internet user behavior research, it need to capture directly the explicit user information demand and indirectly infer and find the invisible user information demand, so it not only effectively overcome the blindness of network information service system in construction, but also enhance network information management and the service initiative and pertinence, better meet the user’s information needs.This paper is based on systerm requirement analysis and key problems, introduces the algorithm of the key problems and the systerm of the overall architecture design and the detailed design and implementation of sub-module in order, also introduces some of key technologies in achieving systerm, including data capture technology, data reorganization and restore, thread pooling and so on. Completing the systerm functionality testing and performance testing, both test results indicate that the systerm will reliably run.This paper does a lot of work based on the data acquisition accuracy and data analysis efficiency. Multi-thread programming combine Lipcap library to capture data and achieve data integrity and accuracy. According to data analysis of high efficency, based on the user behavior data reconstruction and storage, this passage put forword the tree layered but chain storage structure based on dest and source IP address, dest and source port, HTTP data type three combinations of elements mark and a set of efficient memory allocation and management mechanism, and prove them the contribution of the performance optimization in this passage. This paper focuses on analyses of Internet user behavior data are web page average data retention time, page jump rate, page bounce rate, which compared with the traditional analytical data on page hits, more accurately reflects the page browsing and visitor access patterns.Web pages typically contain many kinds of file resources, such as HTML text, pictures, animation resource, embedded pages and so on. Embedded pages make difference on the computing web page retention time, this passage design a kind of embedded page identification algorithm, which successfully eliminate the impact of the embedded page on computing the page retention time.
Keywords/Search Tags:Web page retention time, memory allocate and manage, multithread, Embedded page
PDF Full Text Request
Related items