Font Size: a A A

Design And Implementation Of Network User Identification System Based On Behavior Similarity

Posted on:2019-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZengFull Text:PDF
GTID:2348330542998180Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Social network user identification is an important area of network user research which plays an important role in tracking user behavior and detecting illegal activities.In the existing user identification methods,there are some difficulties in collecting user information.Most of the methods belong to subject research and the amount of data to be processed is small and it is difficult to be used in practical engineering fields.The recognition accuracy of the existing methods needs to be improved.Therefore,doing research on network user identification algorithm,designing and implementing a high-precision network user identification system which can be capable of coping with large-scale data processing has become an important research direction.In this paper,Learning from Positive and Unlabled Example(PU_learning)algorithm is studied and improved,and a set of network user identification system based on behavioral similarity is designed and implemented.The work done and the achievements made in this paper are as follows:?.On the basis of fully researching the relevant algorithms and techn ologies such as network user identification,machine learning and so on,we can find out the shortcomings of the existing methods and techniques,and focus on the algorithms of semi-supervised learning.?.Improve PU_learning algorithm.The user behavior characteristics are designed from the perspective of time and space.The traditional PU_learning algorithm is combined with the GBDT(gradient promotion decision tree)to gradually improve the accuracy of the model by means of iterative training,which can be used to identify the cross-platform network users.Experimental results show that the improved PU learning algorithm improves the accuracy and recall by 1%and 3%respectively.iii.Design and implement network user identification system based on behavior similarity.Starting from the original traffic capture,this paper designed and implemented traffic capture,user information matching,model training and prediction module.It mainly solves the problem of HTTP stream restoration in user information matching module and ETL feature extraction,model training accuracy improvement in model training and prediction module.In the background of big data,Hadoop and Hive are introduced as data processing platforms and tools to ensure that the system runs efficiently and stably.iv.Carry out the tests on each functional module of the system.The results show that the network user identification system based on behavioral similarity has stable functions and accurate identification results.In the process of user information matching,the traffic resolution rate reaches about 1.6Mbit/s,and ETL can process data features extracted from two network platforms with tens of millions of data in an hour.Based on the above work,this paper summarizes the problems existing in the design and implementation of network user identification system based on behavioral similarities,puts forward the ideas and methods for improvement,and puts forward the prospects for the follow-up work.
Keywords/Search Tags:PU_Learning, user identification, GBDT
PDF Full Text Request
Related items