Font Size: a A A

Danmaku Data Mining Algorithm Study Based On Quantity Characteristics

Posted on:2018-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:K Y GuoFull Text:PDF
GTID:2428330563993043Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Video site's danmaku data is the viewer's explicit feedback on video content.Much hidden,valuable information can be found by analyzing the danmaku data.Currently,there are many difficulties in data mining based on semantic analysis of danmaku data.In comparison,data mining based on quantitative features of danmaku data is more feasible and the result is more accurate.A danmaku grabber was designed according to the web interface of LETV,which obtained the danmaku data of domestic TV series released from 2014 to 2016 as the object of data mining.It is proposed to analyse danmaku data from three perspectives,namely(1)Analyzing the distribution of the popularity of the TV drama;(2)Clustering the TV series based on the statistical characteristics of the danmaku,classifying and evaluating the TV series based on the clustering results;(3)Recognizing the high energy points of the danmaku and using the recognition results to evaluate how watchable a TV drama is and enhance the video system interaction.For the first data mining perspective,the plain definition and modified definition of TV drama popularity and the hypothesis testing methods of the statistical distribution of the drama popularity are given.For the second data mining perspective,the feature vector which represents the statistical feature of a TV drama in the clustering procedure and the K-means algorithm to implement clustering are utilized.For the third data mining perspective,a high energy point recognition algorithm based on three-point decision is proposed to achieve automated high-energy point identification in place of video editors.After the performance and effectiveness evaluation,the damaku high energy point recognition algorithm is satisfactory.According to the analysis result of the danmaku data,it is found that the distribution of danmaku number of domestic TV dramas released in LETV from 2014 to 2016 meets the 80/20 rule.Under the modified definition of popularity,the popularity of domestic TV dramas meets normal distribution.The statistical characteristics based clustering results of the TV series are optimistic,each cluster's feature is clear,and according to the clustering results,TV series were classified and rated;Danmaku high energy point recognition algorithm is used to identify the high energy points of 334 TV series,and the number ofhigh energy points is used to evaluate how watchable a TV drama is.Danmaku high energy point recognition algorithm has also yielded good results in increasing the interaction and placing advertisements when users watch videos.
Keywords/Search Tags:Danmaku, Data mining, Statistical distribution, Clustering, High energy point recognition
PDF Full Text Request
Related items