Font Size: a A A

Authorship Attribution Of Chinese Texts

Posted on:2020-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:X L XuFull Text:PDF
GTID:2428330596468998Subject:Public Security Technology
Abstract/Summary:PDF Full Text Request
As a research hot spot in natural language processing,authorship identification has broad research prospects.It can be used to protect the copyright of works in the field of information security.It can also be applied in identity recognition of authors who create harmful information in the public security and the document inspection,which provides ideas and technical support for solving cases.The most important step in the process of authorship identification is text feature extraction.The current situation about authorship identification of text is as follows: the lack of uniform feature set,high degree of human participation,strong dependence to corpus,and insufficient objectivity in the screening process.In order to achieve automatic feature extraction and improve recognition accuracy,this paper makes the following work and innovation based on deep learning:Firstly,aimed at the complexity and no universality of feature extraction,with the necessity to construct different feature engineering for different corpora during the authors' language style modeling process,author of this article proposed a Chinese text author identification model CABLSTM without expert feature modeling.To maximize the extraction of short text features,the model uses the convolutional effect of the convolutional neural network to fuse the attention mechanism and remove the pooling layer to prevent some features from being discarded,which constitutes a text feature extractor,and obtaining contextual information by inputting Bi-directional Long Short Term Memory Network.And then the identification result will be output through the Softmax layer.Secondly,the author designs and implements a text author identification system based on the above model.This system can conduct textual analysis on the tested text with calculating and outputting the keywords,phrases and abstracts through the TankRank-LL algorithm proposed in this paper.It can also output emotional tendency of the text through emotional analysis of Baidu AI,and output the author identify result through the CABLSTM model.Finally,this paper completes the following experiment based on corpus from Weibo text: the accuracy of word segment experiment;the comparison experiment on identity recognition between the traditional identity recognition algorithm and deep learning identity recognition algorithm;keyword extraction improvement experiment.The comparison in the accuracy rate,recall rate and F value to proves the effectiveness of the algorithm,model and system proposed in this paper.
Keywords/Search Tags:Author identification, Text feature, Deep learning, Identification ststem
PDF Full Text Request
Related items