Font Size: a A A

Implementation Of A Network Violent Language Detection System

Posted on:2017-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:R HuangFull Text:PDF
GTID:2348330509460247Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computer network in the 21 st century, it's more and more convenient to communicate and express our views. Internet has brought us immeasurable information and expanded our space for free speech. We can see a large variety of speech in news commentary, micro-blogging message, video barrage, game communication. Open, virtual and concealed are the main features of network, a lot of network violent language can often be seen and it brings serious violation and damage to people's mental and psychological. However, most of the network platforms do not manage language effectively. Their basic strategies are masking a small number of common internet violent vocabulary, but the internet is still full of violent language. It is particularly necessary to search a new method for detecting network violent language. The purpose of this paper is to establish a system which can detect network violent language successfully and locate the position of the violent words or parse accurately.According to the characteristics and manifestations of the internet violent language, this paper has proposed an internet violent language detection method based on dictionaries and rules in the basis of sentiment analyzing. The internet violent language appears as violent words or phrases with specific syntactic structure. Therefore, establishing a vocabulary dictionary of internet violence and extracting specific rules of syntactic structure of parses are the focus of this paper. Building a new dictionary of word segmentation by HMM and context entropy at first, then building a small number of artificial violence vocabulary word, at last finding the words similar to violent vocabulary word by calculating similarity of word vector and corpus statistics. Thereby we can establish the violent vocabulary dictionary. This paper has presented a rule extraction method — x~2-FPN which is about phrase of syntactic structure based on dictionary of nature that combining x~2 statistics and word frequency, part of speech, word location. The experimental results show that x~2-FPN is superior than the method of x~2 statistics, the experimental results also show that the method of combining the vocabulary dictionary of network violence and the rules of syntactic structure of parses can get a good performance.In order to solve the problem of erroneous detection which is caused by the rules, this paper has proposed a method which combines the language model and rules. The language model can be optimized by adding a probability factor to the rules. Experimental results show that the method which combines the rules and language models also has better performance. Accuracy rate, recall rate and F-value of the final network violent language detection system have all reached more than 90%, the expectation of the experiment has been satisfied.After establishing the network violent language detection system, we use the system to collect and create a violent language corpora.
Keywords/Search Tags:Network violent language, Vocabulary and dictionary of the internet violence, Phrase of syntactic structure, Rule, Language model
PDF Full Text Request
Related items