Font Size: a A A

User Preference Analysis Based On Text Classification And Topic Model

Posted on:2018-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:D J YuFull Text:PDF
GTID:2348330533959888Subject:Computer technology
Abstract/Summary:PDF Full Text Request
User preference analysis is the basis of user's specific information service.While user preference is a rational choice when users make tendentious choices of goods or services.The primary purpose of user preference analysis is to screen out the user's interests from a large amount of information,providing more personalized services.However,there are still many existing problems in user preference analysis methods.On the one hand,the majority analysis the inherent attributes of the users,which is hard to make a fine-grained preference analysis.On the other hand,the existing methods are deficient in accuracy and efficiency.User preferences can be obtained by mining user behavior,so according to the fine classification and clustering of user browsing content,we can get the user's fine grain preferences.First,the label is a more granular representation than the class,meanwhile there can be several labels for a content,for which different levels of labeling can provide users with different levels of preference characteristics.Then clustering based on the user's initiative to put together the same content according to the user's potential cognition,which can provide preference characteristics in user behavior level for user analysis.In this paper,we present two algorithms for labeling and an optimized hierarchical clustering algorithm in undirected graph level:A weighted supervised LDA algorithm(WLLDA)proposed uses Chi Square to reduce the dimensionality of text feature.And WLLDA combines a new weighted word bag model to improve the weight of meaningful words in subject classifications of original word bag to increase the differences between topics and improve classification accuracy.Using multi model integration method to training for different frequencies topics,which can solve the interference caused by the corpus inhomogeneity of single model.In this paper,a new method for calculating the closeness degree of topic is proposed.It based on the original theme probability to calculates with key words hit frequency,frequency and label support,and can improve the accuracy of topic prediction.A labeling algorithm based on word2 vec proposed uses CRF to extract keywords from text,and it applies word vector and LR keyword generated by word2 vec to cluster and the construct the label set,avoiding incomplete coverage of artificial label library mining.Finally comparing the similarity of word vectors which are generated by denoising the text and the labeling word vectors to label the text.Parallel optimization method for undirected graph hierarchical clustering propsed in this paper is to construct the user active search intention as undirected graph.According to splitting the hot nodes to weaken the negative impact of attenuation factor on hot nodes.In this paper,the above three algorithms put the user preference for content to the user preference for labels,describing fine-grained user preference characteristics,ultimately achieving improvement of accuracy and speed.
Keywords/Search Tags:user preference analysis, text classification, topic model, latent dirichlet allocation, word vector, hierarchical clustering, graph clustering
PDF Full Text Request
Related items