Font Size: a A A

Deep Learning Based Methods For Authorship Attribution

Posted on:2022-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q HuFull Text:PDF
GTID:2518306524489434Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Authorship Attribution(AA)is the task of predicting the author of a given text by learning the author’s unique writing style.It has been proven that the popularization of computers and the Internet has fundamentally changed people’s lifestyles and also changed the way we generate and receive information.Countless text fragments and documents are generated every millisecond:This is the information age.Authorship Attribution(AA)is one of the key methods to transform the burden of large amount of data into practical and useful knowledge.By studying the reflection of language traces,the author’s identity attribution aims to reveal the identity and sociolinguistic characteristics of potential authors.The progress of authorship supported by style technology has a significant impact on various fields,such as online crime investigation,marketing and social network analysis,literature and science education,etc.In order to solve the problem of authorship attribution,researchers have designed to extract the author’s writing style based on lexical features,character features,syntactic features,semantic features and other style features,and determine the author of the text by comparing the differences of style features between the text with unknown authorship and the known authors.Traditionally,feature engineering methods have been used to manually design style features to derive text related style features from long documents,such as e-mails and news articles.Subsequently,these extracted style features will be used to train the classifier to identify the owner of the input document.The common stylistic features of authorship include characterngram,lexical features,syntactic features and document subject.In recent years,there have also been studies to explore various deep learning methods of the author’s identity.Some methods use convolutional neural networks(CNNs)to extract character features of text,and use the extracted character features to train classifiers,while another method is multi head recurrent neural network(RNN)for character level style representation of author identity.However,these two studies only applied deep learning method to perform authorship task for large text documents.They have not yet been evaluated on short text content in social media.This paper mainly focuses on the use of deep learning related technologies,such as twin network,metric learning,graph convolution network,to complete the author’s identity assignment task and improve the effect of the author’s identity assignment model.This paper first reviews the development of authorship tasks in recent years.Then,we will use two different deep learning methods to complete the author attribution task:one is based on user style embedding,which extracts the author’s writing style features by aggregating the features of each text;the other is based on graph convolution neural network and syntactic dependency tree,which focuses on the extraction and utilization of syntactic features.At the end of this paper,based on the current research status,the author discusses the field of authorship,which is worthy of further research.
Keywords/Search Tags:Authorship Attribution, Siamese Network, Metric Learning, Style Embedding, Dependency Tree
PDF Full Text Request
Related items