Data Mining Of Network Virtual Identity Based On Spark Technology

Posted on:2018-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Song

Full Text:PDF

GTID:2348330518996937

Subject:Electronics and Communications Engineering

Abstract/Summary:

Along with the rapid development of Internet technology, the Internet makes a profound impact on people’s lifestyles and brings convenience to people’s daily lives. Through the Internet, people can easily access information.It is also free to communicate with each other.The Internet provides a new method of learning, entertainment,communication, sharing, which occupies an important position in people’s lives. In the environment of network community, the concept of virtual identity of network user gradually gets people’s attention.Nowadays, all kinds of websites and applications require users to register and log in when they use it, so they will generate a lot of data with virtual identity information in people’s daily network access behavior.These massive virtual accounts contain the user’s personal information. Although these virtual identity information and the user’s real identity are not exactly the same, they will surely have some potential links. So we can use some means of data analysis to deal with these massive virtual identity, from which to extract useful information.We are able to get the user’s identity characteristics, such as gender, age, interests, hobbies and so on which brings ues a more profound understanding of users. And then for different users, we can according to their network behavior to provide a better service experience with some personalized information; On the other hand, from the provider point of view, they can largely reduce the cost of pushing information by providing targeted services.In the face of massive data processing problems, the traditional performance of a single computer immortal can not meet the huge computing needs. So we need some efficient way to deal with the data.Apache Spark, which is a distributed system, is now widely used in the massive data processing.This thesis firstly introduces the basic concepts of virtual identity of network users and the virtual identity data mining at the present stage.Secondly, it introduces the frame structure and operating mechanism of Spark platform, the programming model theory of MapReduce and the distributed storage architecture of HDFS. Storage, preprocessing and data analysis is described in detail. Then, it describes the algorithm of massive virtual identity datan and virtual data mining on Spark platform in detail.At last, it introduces the virtual data mining algorithm and the process of algorithm realization on Spark. And then, it makes analysis and interpretation of experimental results.

Keywords/Search Tags:

virtual identity, data mining, Spark, trajectory prediction, Markov model

Related items

1	Research On Trajectory Prediction And Intention Mining
2	Research On Indoor Trajectory Prediction Methods Based On WiFi Data
3	Research And Implementation Of Network User Virtual Identity Analysis System Based On Big Data
4	Sparse Trajectory Prediction Methods Based On Entropy Estimation
5	Research On Methods Of Trajectory Data Mining
6	Research On Trajectory Prediction Algorithm Based On Sequential Pattern Mining
7	Research On Trajectory Frequent Pattern Mining Algorithm Based On Spark
8	A Trajectory Data Segmentation Method And Application Based On Semantic Features
9	Research And Implementation Of Data Mining System Based On Improved Forecasting Model
10	Research On Key Technologies Of Trajectory Data Mining