Font Size: a A A

Research Of User Profile And Community Discovery Based On Weibo Topic Classfication

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X K PengFull Text:PDF
GTID:2518306575967009Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era and people's demand for social activities,Weibo has achieved rapid development in recent years,and the number of registered users of Weibo has also increased rapidly.As a kind of popular media big data,it contains a lot of user information,so Weibo has great research value.It has the characteristics of rapid dissemination of information and conciseness.However,Weibo has a large number of different information,causing the problem of information explosion.The huge information makes it difficult for users to find the content and users that they are interested in,and the key to alleviate this problem is to establish accurate user profile for users.When the data of user profile is ready,the data can be used for community discovery,the users with the same interest preference can be found.The existing method for creating Weibo user profile is mainly based on LDA topic model,but the profile created in this way has the disadvantages of unclear theme and low semantic consistency of extracted keywords.Aiming at these problems,the neural network model is used to classify the topic of Weibo text data.The existing text classfication model can be divided into pre-traing model and shallow model,the pre-training model has the best effect but its amount of parameters is large and long prediction time is long,the shallow model is the opposite.This thesis builds a student model based on the idea of knowledge distillation for this problem,the pre-training model is used as a teacher model,and the learned knowledge from teacher model is transfered to the shallow model.The student model is jointly built by convolutional neural network and recurrent network,taking advantage of local and global features of the data respectively,the method of adversarial perturbations training and label smoothing is combined on this basis to further improve the robustness and accuracy of the model.Through experimental verification,the text classification model proposed in this thesis has excellent performance while maintaining a small amount of parameters and a short prediction time.The built model is used to analyze user microblogs,and the user interest distributions in different fields can be obtained as a coarse-grained interest profile.The user microblogs can be aggregated in different interest areas by the model,and TF-IDF method is used for keyword extraction,which can be fine-grained description of users' concerns in different interest areas.The interest profile modeling method in this thesis has the advantages of clear themes and good semantic consistency of extracted keywords,which can accurately describe the interests of users.After getting accurate user portraits,fuzzy clustering method is used to discover the communities,and the users with the same interest were brought together.
Keywords/Search Tags:Text Classification, User Profile, Knowledge Distillation, Adversarial Perturbation, Community Discovery
PDF Full Text Request
Related items