Authorship Verification With Latent Dirichlet Allocation

Posted on:2014-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:X J Meng

Full Text:PDF

GTID:2268330392969072

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, the way of people communication has changed greatly. Instant communication as a kind of modern communication way is becoming more and more popular, and become the main way in work and daily life. However, it not only brings convenience but also brings security hole. When we chat with someone by instant tools, we always ignore his real identity. So many illegal people steal other people’s ID and password, and then they chat with us as our friend. As a result, we may be leak out our personal information, or cause loss in money.In this paper, the main research content is how to prevent this security problem to let people communicate safely. In modern time, there are many instant tools, such as MSN, AOL, QQ and so on. Though these tools also set up some functions of safety inspection by some sensitive words, such as bank, account, buy, sell and so on, in many cases those illegal people not only defraud out money but also want to our personal information to do illegal transaction. Therefore, the essence of solving this problem is identifying the identity correctly. As we all know, we chat with other people by text in general. Though we also send some pictures or expression and so on, text is the main form. So, the object of this paper is text information, namely chat logs. We identify the identity by judge the difference among people in the way of speaking and tone. The main contribution of this paper is described as follows. Firstly, considering the specificity of instant messages we just extract modal particle, punctuation, auxiliary word and other some words that have no significance ignoring noun, adjective. Secondly, in extracting features we have no longer based on word frequency but apply topic feature to solve this problem. Thirdly, for those topics that we have extracted we delete those topics that have little effect to final classification result and only reserve those topics that have great affect to classification. Fourthly, because this topic model only considering the features of text content, we put the structure feature to topic model and then use the mixture feature to identify the identity. Experimental result shows: firstly, this topic model is appropriate to identify the identity. Secondly, after sifting the topic the correct rate is improved. Thirdly, the length of text, the topic number, and the way of extracting feature can affect the final result.

Keywords/Search Tags:

instant communication, topic model, authorship verification, featureselection

PDF Full Text Request

Related items

1	Chat Mining For Authorship Verification
2	Research On The Method Of Auto-discovery And Verification Of Topic-Websites
3	The Research And Implementation Of Instant Conmmunication System Based On SIP
4	Research On The Growth Law Of High-Impact Scholars Based On Scientific Literature Communication Network
5	Distributed Instant Messaging System:Design And Implementation
6	Research On Authentication Of Online Authorship Or Article
7	Formal Modeling And Verification Of ROS Communication Mechanism
8	Research And Application Of Topic Evolution Model Based On LDA
9	Research And Realization Of Instant Communication Based On SIP
10	What is an author in the 'Sikuquanshu'? Evidential research and authorship in late Qianlong era China (1771--1795)