Font Size: a A A

Research On Web Text Representation Based On Social Attributes

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:2358330536988537Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increase number of Internet users and the rapid development of social media platform,there has been an explosion growth situation of web pages on the Internet.When People communicate with others through the intelligent terminal in the social platform,it also produced a lot of text data.It is an urgent problem that needed to be solved about how to effectively organize and deal with these massive text information and excavate the hidden and value information from these web text data.In the field of text analysis,text representation occupies a very important position,which means transforming the text in the real world into the text representation which can be handled by the machine.Most traditional text representation methods are focused on the document content,and their feature items can be extracted directly from the content of the document.Therefore,these methods also ignore the text and the external interaction between the behavior and the relationship among the text,which cannot be more comprehensive access to the text of characteristics information.With the large amount of social behavior data produced in social network,some researchers put forward the solution by adding social information to the document representation model,and obtained good results in the field of information retrieval.Social information is added into the text representation model,which combined with the content characteristics and then constructed our text representation model,this method not only considered the relationship between the text and user interaction behavior,but also can effectively solve feature sparsity problem.Aiming at the existing problem of some traditional text representation model,we analyze the content of the traditional characteristics,based on the social characteristics we have obtained,we put forward the following solutions:(1)A multi-layer text representation method is proposed by combining the content characteristics,the topic characteristics and the shallow social characteristics(user browsing behavior).This method takes the internal environment and external environment of the text into account,and proposed a document similarity compute method which led in social characteristics,which makes the content characteristics,topic characteristics and shallow social characteristics interact with each other,and the clustering algorithm is used to evaluate the performance of text representation method.We conducted experiment by using Aminer data set.We extracted shallow social characteristics from the citation relationship between their papers and extracted content characteristics and topic characteristics from paper content.Our approach takes the interaction between the text and the outside world and the relationship among the texts into account,and we verify the inclusion of shallow social characteristics of the text representation model for the clustering effect to enhance the role through a large number of experiments.At the same time,we also found that shallow Social characteristics have a strong discrimination ability.(2)By analyzing the social behavior information(forwarding,comment,collection)of the web text,we extracted the social combination characteristics and the label characteristics from the data set,and combined the topic to construct text representation method.Our approach can solve the sparsity problem to some extent,while the reliability of user behavior characteristics can be enhanced by using web text browsing behavior and social behavior information.The experimental results on Sina weibo datasets shows that the characteristics of social behavior are improved greatly for the representation of text,and the effect of text clustering is also improved.
Keywords/Search Tags:Data Mining, Document clustering, Document Representation, Social Characteristics, Content Characteristics
PDF Full Text Request
Related items