Font Size: a A A

A Study On Authorship Attribution For Chinese Instant Messages Based On Dependency Grammar

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LiuFull Text:PDF
GTID:2415330626459513Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
There is always a great research zest for useful discriminant features available for forensic authorship attribution,especially when it concerns one of the most challenging and frequently forensically involved genres of texts—instant messages.Therefore,the present study endeavors to carry out a task of testing the discriminating power of a series of syntactic features extracted based on dependency grammar for authorship attribution for Chinese instant messages,in an attempt to provide more candidate features which can be selected for use in forensic authorship attribution tasks.The proposed features include mean dependency distance,mean hierarchical distance and relative frequencies of each dependency relation type.Methodologically,a series of classification experiments have been conducted for the demonstration of the discriminating power of the proposed features: features extracted from manually annotated WeChat messages naturally produced by both sociolinguistically similar and sociolinguistically diverse authors are input into a classification algorithm to train models,based on which the discriminating power of the proposed features is evaluated;different feature sets and different author combinations are taken into consideration in the experiments.In depth syntactic analysis have been conducted for further discussions on the experiment results.Statistically significant results of the experiments demonstrate that these features have discriminating power for authorship attribution for Chinese instant messages.These features make different contributions to both sociolinguistically similar and sociolinguistically diverse authorship attribution tasks.What is worth mentioning is that the features give satisfactory performance in a case involving up to five sociolinguistically similar authors.Furthermore,it is found that a feature set including more features and an author combination including less authors can lead to a better result.Finally,in-depth syntactic analysis into some representative sentences of the authors elucidates the specific roles that the features play in different authorship attribution tasks and suggests that a possible linguistic mechanism underlying the features’ discriminating power may be syntactic alignment.
Keywords/Search Tags:authorship attribution, dependency grammar, instant messages
PDF Full Text Request
Related items