Research And Implementation On Weibo Authorship Identification Based On Deep Learning

Posted on:2021-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:W B Tang

Full Text:PDF

GTID:2428330632962800

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,people increasingly rely on social networks and network communication tools for interaction.Due to the anonymity and concealment of online social network users,the Internet has also become a new place for criminals to conduct illegal transactions.Therefore,identifying the true identity of online crime users has become a top priority for criminal investigation agencies when dealing with cyber crime cases.However,criminals often use false information to avoid detection.Therefore,it is very difficult to determine the author's true identity through the user's registration information,which brings new challenges to the research brought by online author identification research.During the past years,authorship identification has played a significant role in the public security area.Recently,deep learning based approaches have been used in authorship identification.However,all approaches based on deep learning require a large amount of original data,while the author only posts a limited number of texts to be used as positive samples.As such,the issue of data missing and class imbalance arises,resulting in the classifier overfitting on small data.Furthermore,this approach can be biased by prior of the data.To address these issues,in this paper,we use Wasserstein generative adversarial networks(WGANs)to generate samples for the positive class which lacks of data and present a novel data augmentation framework for authorship identification.To form an augmented training dataset,we mix the generated new samples with the original features in the dataset.As such,the classifier trained on it will not suffer from overfitting and class imbalance,and thus has substantially improved performance.Using crawled Sina Weibo dataset,we empirically evaluate this data augmentation.The experimental results show that our method has a significant improvement of accuracy by 14%compared with powerful baselines.We further validate its effectiveness via a set of comparison experiments.Based on this method,this article designs and implements a Chinese microblog author identity authentication system.The Chinese microblog author identity authentication system mainly includes a data extraction module,a feature extraction module,a user identification module,a data enhancement module,and a database storage module.The main functions and processes of the module are introduced in detail.Finally,by testing the Chinese microblog author identity authentication system,it proves the usability and stability of the system functions.The results show that the system can effectively authenticate the identity of the Chinese microblog author,and can help public security agencies and other organizations to accurately identify anonymous microblog accounts,thereby providing information security in social networks.

Keywords/Search Tags:

Authorship Identification, Wasserstein Generative, Adversarial Networks, Data Imbalance

PDF Full Text Request

Related items

1	Image Generating And Its Application With Wasserstein Generative Recurrent Adversarial Networks
2	Research On Speech Dereverberation Based On Improved Wasserstein Generative Adversarial Networks
3	Research On Speech Enhancement Based On Wasserstein Generative Adversarial Networks
4	A Data Augmentation Method For Image Class Imbalance Problem Using Generative Adversarial Networks
5	Research On Image Repair And Reconstruction Based On Generative Adversarial Networks
6	Speech Enhancement Algorithm Based On Generative Adversarial Network
7	Conditional Bidirectional Learn And Inference Based On Wasserstein Distance
8	Research On Classification Method Of Imbalanced Data Set Based On Generative Adversarial Network
9	Face Inpainting Based On Residual-wasserstein Generative Adversarial Networks
10	Coverless Information Hiding Based On Deep Generate Model