Font Size: a A A

Research And Analysis Of Chinese Social Website User's Attribute Inference Based On WEB

Posted on:2018-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:X LuFull Text:PDF
GTID:2348330536487949Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularity of social websites,massive data has been generated by the users from these websites every day,which contain a lot of value.Because of user's privacy,they often fill in fetal information or do not fill in their information.How to infer their attribute information hidden has been a hot topic.In this paper,we put users from Sina Weibo as research object.We collect users' information and infer their attribute.These attirbutes contain gender,age distribution,and education distribution.The main work of this paper are as follows:1)Four algorithms to infer user's gender are proposed.They are Gender Inference Algorithm Based on Nickname(GIABON),Gender Inference Algorithm Based on Label(GIABOL),Gender Inference Algorithm Based on Micro-blog Text(GIABOMT),and Gender Inference Algorithm Based on Mean(GIABOM).Experiments show that the accuracy of GIABOM can reach up to 85.55%.The accuracy of GIABOM is much higher than others',which means that it is reasonable to take all attributes into consideration to infer user's gender.2)User's Age Distribution Inference Algorithm Based on Combined Parameters and Characteristic Property of Support Vector Machine Optimized by Genetic Algorithm is proposed.Kernel function and characteristic property are very important for the ability of Support Vector Machine.In this paper,we select linear kernel function,radial basis function(RBF),and radial basis function optimized by genetic algorithm as the kernel function of Support Vector Machine.Our experiments show that the accuracy of linear kernel function can reach up to 75.38%,the accuracy of RBF can reach up to 86.14%,and the accucrcy of User's Age Distribution Inference Algorithm Based on Combined Parameters and Characteristic Property of Support Vector Machine Optimized by Genetic Algorithm can reach up to 89.11%.The results of our experiments show that the user's age distribution inference ability of this algorithm we proposed is better than SVM.3)User's Education Distribution Inference Algorithm Based on Combined Parameters and Characteristic Property of Support Vector Machine Optimized By Genetic Algorithm is proposed.The algorithm is similar with the former algorithm about user's age distribution infernece.Experiments show that the accuracy of linear kernel function can reach up to 81.38%,the accuracy of RBF can reach up to 92.14%,the accuracy of User's Gender Distribution Inference Algorithm Based on Combined Parameters and Characteristic Property of Support Vector Machine Optimized By Genetic Algorithm can reach up to 93.03%.The results of our experiments show that the user's educationdistribution inference ability of this algorithm we proposed is better than SVM.
Keywords/Search Tags:User's Attribute, Social WebSites, Support Vector Machine, Genetic Algorithm, Kernel Function, Chinese Segmentation
PDF Full Text Request
Related items