Font Size: a A A

Research On Detecting & Filtering Newly Coined Profanities

Posted on:2015-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2308330464463429Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, there have been many applications on the network, for example:talking room, online forum and so on. These applications enrich people’s lives, but also make a threat to the development of Internet. Some malicious users release some violence, pornography and other sensitive information which cause bad influence to network users especially the youths. In order to maintain the healthy development of network environment, many applications usually filter profanities posted by users. In order to avoid being filtered, these profanities have been disguised or only partially censored. For example, the bad word "shit" is often written as "shiiiit" or "$h!t".How to recognize these disguised profanities is an important problem. This paper presents an algorithm to recognize these disguised profanities by computing string similarity. While computing string similarity, this algorithm takes phonetic and character similarities into account. This algorithm has very high identification rate about disguised profanities. In order to improve the efficiency of filtering harmful information, and to identify the various sensitive words, this paper presents a C/S filter model, the client scans the text and the server recognizes disguised bad words. Using level filtering, the terminal device scans target text. It needs very little resource but has very high identification rate. Existing content algorithms cannot recognize the disguised bad word very well. This paper uses "crowdsourcing" to recognize these disguised profanities and it has very high identification rate. The results of experiment show that our algorithm outperforms the state-of-the-art filter method for newly coined profanities.
Keywords/Search Tags:Content Filtering, newly Coined Profanities, Crowdsourcing, String Similarity
PDF Full Text Request
Related items