Font Size: a A A

Research And Implementation Of Sensitive Text Classification Algorithm Based On Artificial Immune System

Posted on:2022-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Q XiongFull Text:PDF
GTID:2518306602965569Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the network society,a large number of sensitive information appears,and these sensitive information can be roughly divided into three categories:pornography,politics and violence and terrorism.The existence of these sensitive information will do great harm to the harmonious development of the network society.Therefore,how to identify and accurately classify these sensitive information has become an important issue.In sensitive Chinese text,there are some characteristics such as the same polyphonic words and the reverse order,so the traditional classification algorithm can not solve these problems well.Artificial immune system is an intelligent algorithm that imitates biological immune system and has strong ability to recognize antigens.Among them,Artificial Immune Recognition Algorithm is one with better classification ability.Therefore,this paper tries to introduce Artificial Immune Recognition Algorithm to solve various problems existing in the classification of sensitive texts.As the problem that the calculating method of affinity and stimulation in Artificial Immune Recognition Algorithm can not solve the sensitive problems,which is rearrangement of the order of the text,this article will test the text similarity measure methods such as cosine distance,mahalanobis distance,and compares with Euclidean distance used in the algorithm.As respectively tested on the low dimensional dataset and high dimension dataset,experiment result shows the mahalanobis distance has the best accuracy.Therefore,the mahalanobis distance is used to improve method of calculating affinity and stimulation,the experimental result shows that using the improved calculation method of Artificial Immune Recognition Algorithm has better classification accuracy.(1)An affinity calculation method based on Mahalanobis distance is proposed.For the problem that the method of calculating affinity and stimulation in Artificial Immune Recognition Algorithm cannot solve the rearrangement of the order of the text,this article presents the text similarity measure methods such as cosine distance,Mahalanobis distance compared with Euclidean distance used in the algorithm,and experiment respectively in the low dimensional data sets and high dimension dataset.The results show that Mahalanobis distance has the better performance in the classification accuracy,so an affinity calculation method based on Mahalanobis distance is designed,as the experimental results show that the classification accuracy is improved by using the improved affinity calculation method.(2)A classification method based on mature memory cell set is proposed.As the classification algorithm of Artificial Immune Recognition Algorithm can not handle sample imbalance problems of the sensitive text,an improved classification algorithm is proposed based on the set of mature memory cells generated by Artificial Immune Recognition Algorithm,which is computing the central cells of each categories and the cells around the central points as the antibodies to classify the antigen.The experimental results show that the Artificial Immune Recognition Algorithm using the improved classification algorithm can effectively reduce the running time of the algorithm and improve the classification accuracy of the algorithm on the sample unbalanced data sets.(3)Based on the improved Artificial Immune Recognition Algorithm,this paper designs and implements a sensitive text classification model,and constructs a labeled data set of pornography,politics and terrorism based on the sensitive texts collected from the network.Meanwhile,the effects of several key parameters on the model running time and classification accuracy were analyzed and verified.By comparing the proposed model with the current text classification model on sensitive text datasets,the results show that the proposed model has higher classification accuracy than other models.This paper analyzes the shortcomings in the course of Artificial Immune Recognition Algorithm and makes two kinds of improvement.According to the improved algorithm designs a sensitive text classification model,and the experiment proves that the model has good effect on classification accuracy,but there is no advantage at run time,to solve this problem might need a more perfect model and more efficient mathematical tool.
Keywords/Search Tags:AIRS, Sensitive text, Text classification, Distance measurement method, Mahalanobis distance
PDF Full Text Request
Related items