Font Size: a A A

Research And Application Of Real-time Assessment Of Information Credibility In Social Network

Posted on:2018-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2348330512489056Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of urbanization,an increasing number of people tend to choose the developed city for their work and life,which presents big challenges to the city's public management.So,an effective method of collecting feedbacks in terms of urban services given by citizens is greatly essential in the urban planning and urban construction.Twitter regarded as a popular social media has a large group of users.It enables users to publish tweets to share daily events surrounding them.These tweets provide possibilities for distilling urban services request.However,Twitter belongs to the node-to-surface instant messaging platform.Even though twitter provides a channel to upload and propagate information in an easier way,it promotes the rapid propagation of fake information,thus reducing the value of social data.Though the platform provides a way to manually filter fake information,it is extremely time-consuming.So,we proposed an automated framework to assess the credibility of tweet.Our main contributions in tackling the problem of assessing the credibility of individual tweets are as follows.First,the method of topic detection based on KLD(Kullback-Leibler divergence)is proposed.The number of topic related to a tweet dataset is uncertain,which sets the obstacles to apply clustering method in topic detection.A method of tweet classification by calculating text similarity based on KLD is discussed.Second,our approach addresses multi-level features,namely user-level features,content-level features and word-level features,to construct the credibility model.The word-level features are added to mature the existing credibility modeling.Here is the approach: the content of tweet is tokenized and then converted to a feature vector composed of N-gram tokens.The weight of attribute inside the feature vector can be computed by TF-IDF(term frequency-inverse document frequency).And,the weighted vector is regarded as the word-level features.The Random Forest algorithm was adopted to build credibility model.The result of experiments shows a clear improvement in the performance of the built model.Third,a real-time assessment of credibility system is implemented based on the built model.As a sub-project of the smart city project called CityFeed,our system improves the level of urban services by combining crowdsourcing as well as making up the defects of CityFeed in credibility assessment.To solve the problem that the system suffers from a critical problem of data loss when it cannot process the data in time during peak loads,here the system combines Kafka with Apache Storm to ensure acceptable computation timings in the assessment of tweet credibility.In addition,the system provides map services and visual analysis of hot regions for municipality officers,which provides a basis for allocation of urban resources.
Keywords/Search Tags:Twitter, credibility, KLD, Random Forest, crowdsourcing
PDF Full Text Request
Related items