Font Size: a A A

Feature Extraction And Modeling Of User Behavior Based On Distributed Processing

Posted on:2017-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330518495277Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the prosperity of Internet industry and the upgrading of telecom operators' network infrastructure,data generated by accessing Internet gains an increasingly scale and diversity.The combination of distributed processing technology and data science which includes data mining and machine learning makes research on feature extraction and user behavior become a popular field.As the role of data pipeline,telecom operators have a control of all network traffic from a whole perspective.There is a considerable potential on generate a comprehensive user profile by mining on the DPI dataset captured by telecom carriers.Against this background,this paper conducts a research on a broadband DPI dataset of a certain city collected by a major telecom operator of China.Distributed processing technology combined with data mining method is used to extract user behavior feature on this dataset.Exising researches on studying user behavior within carrier's DPI data usually focus on traffic distribution feature of different applications,or on the trend demonstrated by traffic curve in period of time.By studying the website that user browsed,this paper achieves feature extraction and modeling on user preference when surfing Internet from several aspects including website category,association rule mining,sequential pattern mining and online shopping behavior.Firstly,by running a web crawler program to fetch several directory websites which collect large amount of websites category information,the tag of a website that user browsed is acquired.Besides,the operating system is identified in order to mining interest of different user group by statistic and clustering analysis.Secondly,the method of sequential pattern mining is applied to discovering frequent patterns when browsing websites in chronological order.Furthermore,this paper conducts a separated study on the web log generated by browsing e-commerce websites of uses.Web crawler makes it possible to extract user preference feature from raw DPI data to a fine-grained level including the product that user visited and the brand that user prefered.Frequent itemsets and interesting association rules are discovered by modeling and experiment.
Keywords/Search Tags:distributed processing, feature extraction, clustering, sequential pattern mining, association analysis
PDF Full Text Request
Related items