Font Size: a A A

Research And Implementation On Weibo Authorship Identification Based On Deep Learning

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:W B TangFull Text:PDF
GTID:2428330632962800Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people increasingly rely on social networks and network communication tools for interaction.Due to the anonymity and concealment of online social network users,the Internet has also become a new place for criminals to conduct illegal transactions.Therefore,identifying the true identity of online crime users has become a top priority for criminal investigation agencies when dealing with cyber crime cases.However,criminals often use false information to avoid detection.Therefore,it is very difficult to determine the author's true identity through the user's registration information,which brings new challenges to the research brought by online author identification research.During the past years,authorship identification has played a significant role in the public security area.Recently,deep learning based approaches have been used in authorship identification.However,all approaches based on deep learning require a large amount of original data,while the author only posts a limited number of texts to be used as positive samples.As such,the issue of data missing and class imbalance arises,resulting in the classifier overfitting on small data.Furthermore,this approach can be biased by prior of the data.To address these issues,in this paper,we use Wasserstein generative adversarial networks(WGANs)to generate samples for the positive class which lacks of data and present a novel data augmentation framework for authorship identification.To form an augmented training dataset,we mix the generated new samples with the original features in the dataset.As such,the classifier trained on it will not suffer from overfitting and class imbalance,and thus has substantially improved performance.Using crawled Sina Weibo dataset,we empirically evaluate this data augmentation.The experimental results show that our method has a significant improvement of accuracy by 14%compared with powerful baselines.We further validate its effectiveness via a set of comparison experiments.Based on this method,this article designs and implements a Chinese microblog author identity authentication system.The Chinese microblog author identity authentication system mainly includes a data extraction module,a feature extraction module,a user identification module,a data enhancement module,and a database storage module.The main functions and processes of the module are introduced in detail.Finally,by testing the Chinese microblog author identity authentication system,it proves the usability and stability of the system functions.The results show that the system can effectively authenticate the identity of the Chinese microblog author,and can help public security agencies and other organizations to accurately identify anonymous microblog accounts,thereby providing information security in social networks.
Keywords/Search Tags:Authorship Identification, Wasserstein Generative, Adversarial Networks, Data Imbalance
PDF Full Text Request
Related items