Font Size: a A A

Analysis And Research Of Log Data Based On Big Data Platform

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z E LiaoFull Text:PDF
GTID:2518306101474794Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In the era of digital economy,as mobile Internet and smart devices continue to integrate into people's lives,people will generate a large amount of log data in the process of work and entertainment every day.These log data are an important resource and are very valuable for mining and analysis.For example,when operating the APP,in addition to pulling new users through advertising and other channels,the company also uses log data analysis to provide users with accurate services,thereby increasing the number of daily active users.However,in the face of massive and diversified log data,the existing big data technology still has deficiencies in its application,which prevents companies from using log data to solve operational problems.Based on the above background,this paper mainly conducts the following research work:Firstly,research the existing big data framework to design the architecture and deployment plan of the big data platform.At the same time,complete the development of nonHadoop ecological component integration Ambari script,and build a big data platform based on Ambari.In addition,the game log data obtained during the internship was used as the data source,and the big data platform was used to analyze and process it as the experimental data set in this paper.Secondly,according to the characteristics of the game business,a portrait index system was constructed from the user's basic attributes and behavior attributes,and the Click House portrait module was used to calculate portrait indicators to obtain personal portrait information.In addition,through statistical analysis of portrait information,you can understand the user 's gender distribution,age distribution,Top10 mobile phone brand and SDK version number distribution,high-frequency active time period distribution,and maximum consecutive login days.Thirdly,based on the application of traditional RFM models in academics at home and abroad,combined with the characteristics of the game business,an RFMD model is proposed,and a clustering algorithm and the number of clusters(K value)suitable for the experimental data set are determined by designing comparative experiments.It shows that when K is 5,the clustering effect of KMeans algorithm is the best and meets the business requirements.Combining RFMD model and KMeans algorithm to divide users into five categories: VIP gamers,advanced gamers,intermediate gamers,junior gamers,and low-level gamers,and through detailed analysis to obtain group portrait information and propose corresponding retention strategies to realize user operations..Finally,the performance of the clustering algorithms in the Big Data machine learning algorithm libraries Spark ML and Alink is studied through design performance comparison experiments.The results show that:(1)In some data sets,the performance of Alink's KMeans algorithm in RANDOM mode is slightly higher than that of Spark ML KMeans algorithm performance,and the ratio of the average time consumption of the two is about 1.14 times.(2)The performance of Alink's Gaussian Mixture Model(GMM)and Bisecting KMeans algorithm are higher than the performance of the corresponding algorithm in Spark ML,and the average time consumption ratio is up to 1.86 times and 3.6 times,respectively.In summary,this article explains the development and construction process of the big data platform in more detail,and conducts user behavior analysis and research on the game log data based on the big data platform,which can bring some certainty significance of reference to the analysis and application of log data in different scenarios.In addition,this article also makes a comparative study on the performance of the current popular big data machine learning algorithm library,hoping to give colleagues a certain reference value.
Keywords/Search Tags:big data platform, log analysis, user clustering, Alink
PDF Full Text Request
Related items