Font Size: a A A

Research On Intelligent Detection Of Cross-site Scripting Attacks

Posted on:2022-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Fawaz Mahiuob Mohammed MokbalFull Text:PDF
GTID:1488306764993039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread of the Internet and the transformation of the world into a global village,web applications have gained increasing attention from companies,organizations,and individuals over the years.They have become an integral part of our daily lives.As a result,they have become prime targets for cyberattacks,particularly injection attacks.Injection attacks on web applications have witnessed a significant increase in recent times.They have become at the forefront of the attention of information security experts,most notably cross-site scripting attacks,which still pose a real challenge to all users,developers,and authorities.Cross-site scripting attacks(XSS)are the most common,growing,and dangerous attacks on web applications.These attacks have a range of severe effects on users,governments,and companies.The most important of which are stealing confidential user information(e.g.,passwords and session tokens),impersonation of the user to perform authorized acts on his behalf,changing the appearance or behavior of targeted websites,stealing confidential information of companies,governments,and so on.Thus,these authorities may suffer dire consequences,including loss of reputation,legal battles,and financial losses.AI technology adds significant value in the cybersecurity domain.XSS,however,has no public and standard training datasets available.Even in simple data used by researchers,they are often limited,selective,and significantly unbalanced,which may not meet the standards of well-developed machine learning algorithms and limit the performance of the detection system.In addition,websites are a mixture of several languages,such as HTML and Java Script.Java Script is technically non-standard,enabling the use of various coding techniques that are susceptible to attacks and triggering in different ways,resulting in a large number of real threats not being detected.Consequently,XSS detectors based on AI technologies still have some deficiencies in the remarkable missing cases of false-negatives rate(FN)or the un-ignorable case of false positives(FP).This thesis presented a unique set of solutions to address the challenges mentioned above based on artificial intelligence algorithms,analysis,and statistical tools for data science as follows:(1)The thesis addressed unavailable XSS-based data and provided an XSS-based dataset for AI techniques usage comprehensively and uniquely.The dataset was designed to be representative and unbiased collected from more than 50,000 web apps using random walking and jumping algorithms.The developed data is accurate real-world consists of 138,569 unique records selected uniformly and 179 valuable features ready to be reused by researchers to develop their AI models and extract their suitable subset.(2)The thesis proposed an MLPXSS scheme,a unique dynamic feature extraction technique(DFE),and extracting features from the mixed codes of web app pages.DFE combines three models that work together;each model has its functions,including the HTML feature extractor,the JS feature extractor,and the URL feature extractor.DFE can be integrated with any AIbased algorithms to provide training and testing data in dynamic behavior.The proposed technique was integrated with various ML and ANN algorithms at the testing phase.Finally,neural network-based multilayer perceptron(MLP)was selected as the best model fit with high performance.It formed a stable,high-precision,low-complexity scheme that simplifies deployment,combined with a dynamic feature extraction model,makes it an independent platform for detecting XSS-based attacks.(3)Many improvements were made,developed a hybrid features selection method and the extreme learning optimization approach.As a result,a novel detection framework named XGBXSS was proposed for detecting XSS-based attacks.The proposed framework utilizes an ensemble-learning technique exemplified on Extreme Gradient Boosting(XGboost)along with an extreme hyper-parameter optimization approach.The valuable 179 features are extracted from the dataset using our enhanced dynamic feature extraction(EDFE)technique using the dictionary and the Brute force method to scale up the search space for features.A novel hybrid technique for features selection that can precisely determine the ideal and important characteristics of XSS attacks and reduce model complex was further proposed.It combines fused information gain(IG)and sequential backward selection(SBS)to find an optimum subset of 30 features while preserving the detector's high efficiency.The importance of this approach is that it can bridge the existing research gap associated with previous detectors by combining a higher detection rate with lower computational complexity and minimum FP and FN rates.Besides,it can be implemented as a self-contained framework that is capable of defeating such attacks.(4)The C-WGAN-GP generative adversarial networks to overcome the highly unbalance XSS dataset problem was proposed in this thesis.It is a data augmentation algorithm that derives its power by using two integrated networks,Conditional and Wasserstein Generative Adversarial Networks.The advantage of the proposed algorithm is to collect the desired data use conditional directivity and Wasserstein-1 optimization to enhance the XSS attacks detection system in a low-resource data environment.Furthermore,the algorithm used minority classes' overall distribution to generates indistinguishable minority class samples of the same distribution as those in real-world attack scenarios instead of local information as traditional approaches do.Further,the algorithm can easily be extended and adapted to other applications,such as medical applications.(5)The thesis also contributed to developing a method for semi-structured data sources(e.g.,text payload attacks)called NLP-SVM,using Natural language processing(NLP)and an ML scheme that has high generalization power for semi-structured data.The suggested NLP-SVM approach processes the attack text payload using NLP to provide vectors of the tokens.Later the tokens vectors were averaged using the word2 vector model to get payload-level vectors instead of word embedding-level for fitting the support vector machine algorithm.The approach has been proven its efficiency by validating it using over 20,200 samples of XSS text payloads along with double-checking,including 10-fold cross-validation and held-out testing dataset.
Keywords/Search Tags:Cross-site scripting (XSS), machine learning, conditional Wasserstein generative adversarial network, artificial neural network, Natural language processing
PDF Full Text Request
Related items