Font Size: a A A

Beheavior Analysis Of Network Node Based On Hadoop

Posted on:2016-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:P YangFull Text:PDF
GTID:2298330467493077Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet technology and the constantly improvement of domestic network infrastructure, advanced network technologies and applications become popular rapidly while the number of Internet users continues to increase. The proportion of national offices which handle the official business with computers increases to93percent. The majority of enterprises come into the enterprise information technology, international information highway. Increasing popularity of the Internet, the rapid expansion of the network size and the rapid increase of network nodes provide people a lot of convenience, but they also bring some new problems. There are many loopholes in the security area, which make internet users encounter the threat of various network attacks. Therefore, studying and analyzing the behavior of the network nodes is of great significance. With the rapid development of the Internet, the growing number of users, and a sharp increase of the network traffic, requirements of network data storage and transmission has far exceeded the processing power of a traditional database. Apache Hadoop is an open source project capable of large amounts of data software framework for distributed processing, which can easily distribute large data storage.This thesis introduces the background and significance of the network nodes behavior analysis, and then introduces the Hadoop technology and network behavior monitoring and analysis system. After that, based on the communication characteristics and flow characteristics of the network session, this thesis put forward a new network session recording-Compound Session (CS), this conversation can be more detailed conversation reflects the characteristics of the network session packet characteristics. CS data acquisition and pre-processing provide a foundation for analysis of experiment in this thesis. Based on CS data, traffic network nodes, this thesis analyzes the number of users accessing, the traffic of network nodes, and reveals the distribution of the number of users accessing and the traffic distribution of network nodes. For the original K-means algorithm sensitive to the initial cluster centers, as well as the evaluation function considers only the difference in the defect cluster, an improved algorithm is proposed to optimize the initial cluster center selection method and equalization evaluation function. Experimental results show that the improved algorithm can effectively eliminate the instability of the clustering results, greatly improved the accuracy of clustering. Besides we complete network node clustering analysis based on distributed K-means algorithm achieved On the Hadoop platform. Finally, using the ARIMA model, we predict parameters of the traffic and the number of users of network nodes, which is a good way to prediction. In order to detect abnormal network nodes in the network and overcome the deficiencies in the past, we propose a new anomaly detection algorithm based on threshold determination and distance calculation, which is fast, efficient and real-time updating for the abnormal network nodes detection.
Keywords/Search Tags:Hadoop, big data, CS, network node, behavioranalysis
PDF Full Text Request
Related items