Font Size: a A A

Research On Large Scale Web User Behavior Analysis Based On Apsara Cloud Platform

Posted on:2015-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X X HuFull Text:PDF
GTID:2308330485990850Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the growing popularity of information technology, the Internet has increasingly become indispensable to people in their daily life. People produce a wide range of behaviors on the Internet. User behavior analysis helps to build personalized web services, achieving more accurate ad targeting as well as personalized recommendation. Therefore, there is more and more research work focusing on the study of web user behavior.Currently, research work on web user behavior mainly focuses on statistical analysis of the user behavior patterns. Besides, there is some work analyzing the web contents which are visited by the users in a low level and most of these work is based on small scale user behavior data. Therefore, research work on the analysis of large scale web user behavior is insufficient. Till now, there is still no general framework to analyze the large scale user behavior. Currently, distributed cloud platform is a common solution to storing and processing massive data. For the problem of high time cost for filtering massive web pages by meanings of classification as well as the low level in which the web page contents visited by the users are analyzed, we take advantage of the distributed processing capability of apsara cloud platform to process and analyze the massive web user behavior data, extracting a wealth of filed attribute information at a user level, meeting the requirement of analyzing massive web user behavior data efficiently.The following work has been done in this dissertation:1. On the basis analyzing the apsara cloud platform, we design a massive web user behaviors oriented analyzing and processing system architecture. The architecture is composed of seven modules including network user behavior logging, web content crawling, web page cleaning and keyword extracting, efficient filed web page filtering, web page attribute generating, user attribute generating, statistical analyzing. The architecture can effectively support the massive web user behavior analysis based on apsara cloud platform. Subsequent studies show that the framework can take full advantage of the processing capability of apsara, accomplishing convenient and efficient data processing service.2. To reduce the processing time in large scale web page filtering by means of classification, we propose a two-phase joint filter strategy. The average amount of web pages browsed by web users is ten billion in the web data collected by the apsara platform. To solve the problem of high time cost using common filed classification, we design a two stage processing method, that is, first filtered by filed dictionary and then classified by a classifier. The proposed method reduces the processing time significantly.3. For the low level in which the web page contents visited by the users are analyzed, we extract the deep level user attribute information based on web browsing behavior. By the way of building domain description, we extract category, sub-category and other attribute information form the web pages users browsed using a multi-level classification method. The user attribute information is achieved by aggregating the web browsing log and the web page attribute information. In addition, for convenient statistical analysis on user attribute information, the user attribute information in a period is aggregated incrementally with low space complexity.In this dissertation, the key technology of fast filed filtering of massive web page data and extracting user attribute information are researched. On the basis of above research, we implement a massive web user behavior analyzing and mining system. The experiment results show the effectiveness and efficiency of the technical solution in the thesis.
Keywords/Search Tags:Web User Behavior Analysis, Apsara Cloud Platform, Massive Data Processing, Web Page Filter, User Attribute Information
PDF Full Text Request
Related items