| Social tags,as important data generated by Internet users in the Web 2.0 era,users review and tag Internet Web resources in a free and open network platform with their own understanding as the starting point,generating a large number of resources,users and tags of social tagging information.However,social tags are characterized by strong independence,spontaneity and openness,leading to problems such as semantic ambiguity,word redundancy and resource independence.Therefore,constructing a topic identification method can reveal the potential knowledge embedded in social tags.This thesis carries out the following research work on the topic identification of social tags:(1)Aiming at the topic identification problem of resources in social tags,a topic identification method based on linear regression is proposed.Since the resources in social tags contain a large number of potential semantics,in order to obtain the potential relationship between resources,the potential connection between resources is studied from the perspective of fitting resources with a linear regression model,and a fitting curve between resources is constructed through the linear regression model,so that the dispersion distance of each resource is obtained.On this basis,the weight value of each feature word is formed by the weighting method,and the feature word weighting vector is applied to the LDA model to form the Feature word Weighting-Latent Dirichlet Allocation(FW-LDA)topic model.The experiments show that the linear regression-based FW-LDA has better topic identification effect compared with other related topic models.(2)Aiming at the topic identification problem of resources and tags in social tags,a topic identification method based on the similarity of information entropy is proposed.Since tags have special composition structure and semantic information,and there are two kinds of text resources,resources and tags,both of which have independent and identically distributed characteristics between resources.Firstly,a method to eliminate the independence of resources and tags based on information entropy similarity is proposed.This method constructs the potential relationship undirected graph of resources and tags respectively,and uses the random walk method to obtain the weight value of each resources and tags respectively.Secondly,on this basis,the feature word weight vectors of resources and tags are obtained by weighting method,and combined with LDA to form the Joint Feature Word Weighting-Latent Dirichlet Allocation(JFWW-LDA)topic model.The experiments show that the JFWW-LDA topic model has better topic identification effect than other related topic models.(3)Aiming at the problem that traditional topic models cannot effectively use resource labels for topic identification,and thus cannot achieve a more fine-grained topic identification effect,a fine-grained topic identification method based on deep learning is proposed.Firstly,the text corpora of different categories are labeled,and an Attention-based Text Convolutional Neural Network(Attention-Text CNN,ATT-TCNN)text classification model is proposed.The corpus with mixed labels is formed into a corpus with label classification after ATT-TCNN.Secondly,on this basis,an LDA topic model based on ATT-TCNN(ATT-TCNN-LDA)is proposed,and each classified corpus is formed into topic clusters through LDA,so as to achieve more fine-grained topic identification.The experiments show that ATT-TCNN has better classification effect compared with other related classification models,and ATT-TCNN-LDA has better topic identification effect compared with other related topic models.Figure [33] Table [11] Reference [93]... |