Font Size: a A A

Research On Abnormal Behavior In Mass SMS Data

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhanFull Text:PDF
GTID:2348330545962571Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet communication,the traditional communication industry has been hit by the huge impact of the Internet.However,as a traditional means of communication,SMS is still a useful tool in daily life.Since the start of SMS,merchants use spam text as a way to marketing,after which lawbreakers gradually use telecom fraud.With the escalation of anti-fraud technology,there are many different variants of telecom fraud.How to detect telecommunication fraud quickly at a low cost and to locate telecommunications fraud to preventing telecommunications fraud from the source has always been an enduring issue.Related research on this,there are a wide range of in-depth application scenariosSo far,the traditional text categorization based on frequency and mutual information has reached the bottleneck in performance and speed,while recently the text classification based on CNN neural network is subject to expensive equipment.Therefore,how to guarantee the accuracy of a certain degree of accuracy,A set of efficient and efficient text categorization system has been developed to meet the individual needs in the context of telecommunications fraud.It is a new challenge for text categorization system.This paper studies the text categorization and SMS fraud extraction technology,analyzes the hot issues in spam message classification,processing and telecom fraud research.From the perspective of how to balance the speed and accuracy of SMS classification,Fast text classification system.The use of artificial annotated spam samples to do supervision and study,mass tag spam tagging,the initial location of fraud types of SMS.Then,through the text feature extraction of the scam message,the most popular forms of telecom fraud are precisely located.The research content and innovation of this paper mainly include the following points:This paper designs and implements a fast text classifier based on hierarchical softmax,Without sacrificing too much classification accuracy,the text classification model can be trained faster and the model training and text clustering can be effectively performed on the mass text message data.In the traditional text categorization method,classification accuracy is a big bottleneck,and as for CNN-based text categorization system,model training time is a serious problem.In this system,through the single-layer neural network,under the simple scenario of single-label classification,combined with some commonly used techniques and ideas of text classification,a text classification system which can do a lot of data training in a very short period of time is realized.Operators provide manual annotation data to verify.Experiments show that,through the text classification system implemented in this paper can achieve good results.Then,in order to better meet the needs of accurately locating new types of telecommunications scams,we can extract the scam messages that meet the requirements of "Number Swiping Scams" to the greatest extent.Based on the above mentioned,we introduce the n-nearest word similar text clustering method.By extracting similar features from similar texts,extracting short message clusters that meet the "Number Swiping Scams"scenario,and then extracting the characteristics such as the call by the rules,so as to accurately locate the personalized demands of "Number Swiping Scams" Operators to eliminate new means of fraud provides methods and ideas.
Keywords/Search Tags:text classification, spam text, feature extraction, telecom fraud
PDF Full Text Request
Related items