Font Size: a A A

Optimization And Application Of Hashtable In Data Collection System

Posted on:2018-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2348330518996117Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet, we can use the Internet in every aspect of our lives. The number of netizen is getting bigger and bigger with the innovation of mobile devices. Dramatic increase of the number of Internet users leads to tons of data which also can cause many problems. If these data traffic can be utilized properly, it will make contributions to sustainable development of the Internet. By analyzing the data traffic, we can monitor network congestion more accurate and more rapid. We can also take advantage of these huge data to analyze user behavior and leads to customized experience. As the first step of data analisis, data flow collection which is the enterance of original traffic flow plays an important role.The main function of data collection system is parsing packets,matching and associating messages, outputing formatted logs. In matching component, hashtable regarding 5-tuple as key is used to put same kind of packets together. The thesis designs several ways to speed up the hashtable cost so that the system can reduce packets dump. To reach this goal, firstly,we proposed performance indexs to quantify the hash function and ways dealing with conflicts. Then, after investigating the data collection system,we find out two main situations which may cause packets dump. For each situation, kinds of methods are presented to optimize hashtable module and decrease packets dump. In this thesis, we also make some tests and comparisions to verify the optimized performance under real traffic flow.This thesis also analyses the influence that hashtable's memory footprint made to its processing speed. We conclude the distribution of hashtable chain's length in chaining-hash so that we can gain more accurate acquisition accuracy whithin less memory footprint. Based on this length distribution, we proposed a sizeble hashtable to optimize hash performance.In the end, we summarize these optimations of hashtable and prospect the subsequent work in the future.
Keywords/Search Tags:5-tuple, chaining hash optimazion, data collection system, distribution of chain's length
PDF Full Text Request
Related items