Font Size: a A A

Analysis Of Enterprise User Behavior Based On The Big Data

Posted on:2016-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Y RenFull Text:PDF
GTID:2298330467992550Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Until now, the scale of the Internet of China has already been a mature size and the Internet applications also gradually develop from simplification to diversification. The Internet is changing the people’s study, work and lifestyle, and even affects the progress of the whole society more and more profoundly. With the rapid development of the Internet, we are gradually stepping into the "big data era". The data, including enterprise data,web site data, shopping data and so on, has become an important part of our virtual and real life. Facing the thousands of data and their various data structures, traditional relational database hasn’t well solved the problem caused by the big data yet, and the same to the stand-alone. But, the MapReduce programming model by Google can deal with the large data quickly and efficiently. Hadoop cloud platform based on MapReduce programming model is a software framework of distributed processing. Thus, in the recently years, the Hadoop has become an indispensable a tool of analyzing the big data.In this thesis, we firstly introduce the background and significance of the research. Secondly, from the perspective of the network application, we have the preliminary understanding of the network flow, and then we introduce the conception, the content, the significance and the method of the network user behavior analysis under the big data. Afterwards, we briefly introduce the technology of Hadoop system used in this thesis, including Hadoop distributed file system, MapReduce programming model, the hbase distributed storage system and the data analysis tools--hive. Finally, we present the system framework in the thesis and the process of data analysis.We analyze the HTTP message, Dns message and the data in the host flow table from the enterprise network users’in this thesis. Through the compound data that we defined, we analyze them from the user’s perspective and then obtain the host flow table including the real-time table, day-time table and the month-time table. Then, we research and visualize some properties in the real-time table and day-time table in order to analyze the behavior of enterprise user. Finally, we test and verify whether the short-time prediction model of the Autoregressive Integrated Moving Average Model(ARIMA) can predict the total Internet traffic every day in an enterprise, and find the optimum parameters of the model.
Keywords/Search Tags:network traffic, big data, distriputed processing, enterprise user’s behavior, the host flow table
PDF Full Text Request
Related items