Font Size: a A A

Video User Classification Based On Improved TF-IDF Algorithm And Preference

Posted on:2019-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:F F LiFull Text:PDF
GTID:2438330545995575Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of multimedia and network,a number of new industries emerging and have changed people's living habits,spiritual and cultural needs profoundly.The new emerging things represented by online video is developing by leaps and bounds in recent few years,and more and more users are attracting with its rich information expression and fast information delivery ways.However,because of rich in content,huge in number and diverse in structure of video data,it brings a great challenge to user retrieval.The users are getting harder and harder to pick out the video which themselves really like in a short period of time from the massive selection of video.As mentioned above,this paper builds the classification model of user's age range according to the user's preference of video and user's characteristics,and this classification model is suitable for video website operators to accurately recommend the information to the user and improving the utilization of information.This mainly studies of the paper is video user classification based on improved TF-IDI algorithm and preference.The main work is as follows:1st,Study related theory and technology of Spark computing platform. In the Spark computing framework,we use Naive Bayesian,TF-IDF weighted algorithm,improved TFC-IDFC weighted classification algorithm and the user's viewing video and preference as feature items to classify the user's age-specific classification models by training Naive Bayes,TF-IDF weights and improved TFC-IDFC weights respectively without considering the weights of feature items.Contrast the classification effect,this classification model is suitable for video site operators to accurately recommend the information to the user,and can improve the utilization of information.2nd,The age is discretized and tagged,and after the data obtained is processed,a space vector model is formed.3rd,Currently study of user preferences for video is based on video clicks generally,however,the number of clicks actually can not completely reflect the user's preference for video,the unintentional invalid clicks,the speculation clicks and so on can not reflect user's preference.This paper propose a relatively accurate way to calculate the user's preferences based on the time of watching and the actual length of the video.4th,A improved TFC-IDFC algorithm is proposed by considering that the traditional TF-IDF algorithm does not reflect the distribution characteristics of the feature items within and between classes.Through the comparison of above three classification algorithm in the accuracy rate and the F1 value,it is proved that the weighted classification algorithm is better than the non-weighted algorithm and the improved TFC-IDFC algorithm is better than the traditional TF-IDF algorithm.
Keywords/Search Tags:TF-IDF, preference, Spark, video, user characteristics
PDF Full Text Request
Related items