Font Size: a A A

Research On The Analysis And Application Of Value Account In Twitter

Posted on:2019-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ZhuFull Text:PDF
GTID:2348330569487725Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
It is easier and easier for people to communicate with each other and get more information with the rise of social networks.More and more people are using the social networks.Twitter is one of the most active social networking sites and it has a large number of the registered accounts.The types of the twitter accounts are numerous.Some of them are spam accounts and the others are the value accounts that someone can get the useful things which they are interested.But how to get the required information as comprehensive as possible for people has become a hot research issues for many experts and scholars who are active in academa and industry in recent years.This thesis is mainly focus on the category classification problems based on the English value account in Twitter.Two innovations have been proposed as follows:(1)A feature extraction method based on the multiple attribute features for the value accounts in twitter was proposed.Compared with the traditional method which only focused on the keyword features,we also extracted the named entity features,the topic tag features,the URL features,and the digital features from the value accounts.We can get the more comprehensive and accurate results by combining all of these features for the classification.In terms of the feature selection for value accounts,this thesis proposes a method based on the information gain.This method made the improvements on three aspects compared with the original method: characteristic frequency,the degree of dispersion within a class and the difference between classes.Compared with the Top-K method which was commonly used in the domain classification on the value account,the improved feature selection method can select a more representative text feature from the value account and the results of the classification have also been improved.(2)A classification method based on semi-supervised collaborative training was proposed on the twitter value account.People generally adopted the supervised learning methods to classify the value account in previous studies.However,this method requires a large number of manual training sets,which costs a lot.The semi-supervised collaborative training method can combine some manually labeled accounts and some unlabeled accounts to perform account classification and achieve better classification results.But the standard Tri-Training algorithm is not suitable for the situation of this thesis,we proposed an Improved Tri-Training algorithm.The algorithm chooses the naive bayes as the base classifier.We adopted the Random Suspace method to make sure the differences of the base classifiers at the beginning.We set a confidence thresholds combining with voting for the results of two base classifiers to make sure the unlabeled account was put in another base classifier.The final result relys on the majority voting and thresholding.An account classification method had also been proposed based on semi-supervised collaborative training on the twitter value account.
Keywords/Search Tags:twitter, account classification based on interests, information gain, cooperative training
PDF Full Text Request
Related items