With the rapid popularization of platforms such as social networks,social network data has also increased and made information processing more difficult.Labels can not only be used as a solution to quickly obtain people’s interest content from big data,but also have certain application value for research such as information recommendation.Therefore,it is significant to extract high-quality labels from social network data.The traditional methods don’t deeply consider the characteristics of social network data such as the influence of the network structure on the label extraction.Aiming at the beneficial supplementary effect of network structure on label extraction based on text or content information,label extraction methods are proposed by considering the characteristics of social network data from three aspects,including using combined text data and network structure for label extraction,extracting user interest label based on the characteristics of massive data size and persistence for social extraction,and extracting user real-time interest label based on the characteristics of dynamic change.Three methods are studied as follows:(1)Aiming at the problem of insufficient information usage in social network data that limits the accuracy of label extraction,a systematic user interest label extraction approach UNITE(User-Networked Interest Topic Extraction)is proposed by combining the text content of Weibo and social network information,which utilizes the overlap mechanism between label of social neighbor and that of current user.Additionally,UNITE is extended to the special case of social networks such as scientific research cooperation network.Based on this approach,UNITE_COKE(UNITE based phrase-Co-Occurrence-enhanced Keyphrase Extraction)is designed UNITE to improve scientific literature keyword extraction by using high-frequency word pairs in a large corpus.Experiments show that it is universal and exemplary significant to use both text content and social network structure for improving the quality of label extraction.(2)Aiming at the problem that the large-scale social network structure is difficult to process and the continuous arrival of social network data when extracting user interest label in traditional social network,a method called UNITE_SS(UserNetworked Interest Topic Extraction in the form of Subgraph Stream)is proposed for user interest label extraction,which firstly transforms the large-scale social network structure into a data structure of "subgraph stream",and then the interest labels for users are extracted under the data structure.Experiments show that UNITE_SS can reduce the computational cost while ensuring the accuracy of social network user interest label extraction.At the same time,it is proved that the "subgraph stream" data structure can not only be used to extract user interest label from social network,but also provide a feasible solution for large-scale graph computing problems under resource constraints.(3)Aiming at the dynamic characteristics of large-scale social networks are not considered deeply,which leads to the lack of real-time problem of the extracted interest labels.Based on the data structure of " subgraph stream" in the previous research work,the method of "coming and processing" is proposed to ensure the real-time performance of extracting user interest labels.Then,combining the rough set upper and lower approximation theory and the designed user temporal characteristics,a large-scale social network user interest label extraction method RS_UNITE_SS(Rough Set based User-Networked Interest Topic Extraction in the form of Subgraph Stream)based on the subgraph stream is proposed.Experiments on two real data sets verify that RS_UNITE_SS achieves a certain balance in the accuracy and efficiency of user interest label extraction. |