Research And Implementation Of A Sensitive Content Identification Mechanism For Big Data Security

Posted on:2023-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:M F An

Full Text:PDF

GTID:2558306848954989

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

At present,we are in the era of big data.How to protect the security of big data,especially sensitive data,is a direction worthy of more attention.In big data,text data is the main data.Most of the sensitive data detection tools in the market only use the sensitive word matching algorithm to retrieve the sensitive words of the text content,which will lead to a very high misjudgment rate of sensitive data,and need to invest a lot of labor cost to further screen the suspected sensitive information.Therefore,based on the idea of "data available and invisible",this paper proposes an intelligent auxiliary identification mechanism of sensitive data,that is,on the premise that the original data does not leave the data holder,machine learning algorithm is used to detect the sensitive information of the original data,so as to improve the accuracy of sensitive information judgment.The main contributions of this paper are as follows.(1)A text readability judgment model based on Ada Boost algorithm is proposed.The model judges the text readability by constructing two basic classifiers with different text feature extraction methods,so as to ensure that the text has readability in the subsequent judgment of sensitive text.The experimental results show that the accuracy of text readability determination based on Ada Boost algorithm is higher than that of basic classifier 1 and basic classifier 2,and the accuracy is more than 80%.(2)A sensitive text judgment model based on context semantics is proposed.The model detects sensitive data for the text judged to be readable.Among them,this paper improves the existing matching algorithm,increases the processing of sensitive words containing Pinyin,and uses word2 vec model to associate context semantics,so as to judge the sensitive text.Based on the text classification corpus provided by Fudan University,this paper compares and analyzes the effect of matching algorithm and sensitive text judgment based on context semantics through experiments.The experimental results show that the accuracy of sensitive text judgment based on context semantics reaches 87%,which effectively reduces the misjudgment of sensitive information caused by matching algorithm only matching sensitive words according to rules without considering context semantics.(3)Based on the research and analysis of text readability judgment model and sensitive text judgment model,focusing on text reading,text readability detection and data sensitivity detection,this paper completes the demand analysis,overall design,main function realization and test of the prototype system of intelligent auxiliary recognition tool for sensitive data.

Keywords/Search Tags:

Text readability, Sensitive text, AdaBoost algorithm, SVM algorithm

PDF Full Text Request

Related items

1	Research On Text Readability Assessment Based On Neural Network Models
2	Research And Implementation Of Sensitive Text Classification Algorithm Based On Artificial Immune System
3	Research And Lmplementation Of The Detection Of Network Image Sensitive Text
4	Research On Text Detection And Location In Complex Background Images
5	A Study On Chinese Text Categorization
6	Research On Prediction Of Public Opinion In Stock Market Network Based On AdaBoost-IWOA-Elman Algorithm
7	Research On Text Representation Technologies For Readability Assessment
8	Text Representation And Algorithms For Chinese Text Classification
9	Image Sensitive Text Information Identification Based On Emotional Polarity Discrimination
10	Research On Simplification Of Automatic Chinese Text Based On Readability Evaluation