Internet Sensitive Information Identification Based On Semi-Supervised Learning

Posted on:2013-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:H Wang

Full Text:PDF

GTID:2268330392970612

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, people depend more and more on the net work to access and release information. Internet can storage and transmission a lot of the information which has a great effect.But it also hides a huge security threats. The criminals make use of the features of free interactive of the Internet and spread several of speech of damaging the social harmony, and these content called sensitive informa-tion. In case of the speech spread out, it tend to cause extremely bad influence, and br-ings the enormous pressure from public opinion and economic losses. Therefore, it is necessary to identify objectionable internet information accurately and timely.The propagation speed of the sensitive information is very fast. Traditional mac-hine learning method is facing a serious problem, that is, unable to spend a lot of time to sample labeling. We can only use a small number of labeled samples to train the cl-assifier with the help of the multiple quantities of labeled samples.Sensitive information only take a small part in the network public opinion. In the collected samples, general public opinion information dominated the most of them. If training classifier by these data, classification result is inevitably inclined to the type of the large numbers. To solve this problem, over-sampling can increase the number of the fewer one, which can led to a better performance of the classifier.This paper uses text classification method to solve the problem of identifying the sensitive information in the Internet. The sensitive information shows the features of fast spreading speed, bad influence and low number. In the following text, various m-ethods are adopted to solve the above problem. A method is proposed which combine-d the semi-supervised machine learning with the over-sampling meanwhile improved the traditional SMOTE algorithm. Experimental results show that improved algorithm can effectively improve the performance of classifier.

Keywords/Search Tags:

Sensitive Information Identification, Semi-SupervisedLearning, Imbalance Data, SMOTE

PDF Full Text Request

Related items

1	Route Based SMOTE Improvement Algorithm L-SMOTE
2	Identification Of Encrypted Traffic As Small Sample Of Class-imbalance
3	Research On Improvement And Parallelizationof Classification Algorithms Inimbalanced Data Sets
4	Improved Grouped SMOTE With Noise Filtering Mechanism
5	Neural Network Modeling Of Imbalance Missing Data And Its Application
6	Research On Classification Of Imbalanced Data Based On Convolutional Neural Network
7	A Research Of Cost-sensitive Classification Methods Based On LGC
8	Knowledge Discovery from Databases: Cost-sensitive and imbalance learning
9	Unbalanced Data Sampling Based On Sample Prior Distribution Information
10	Research On The Application Of Generative Adversarial Networks In Class Imbalance