Research And Implementation Of Noise Filtering System For Android Malware Detection

Posted on:2023-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2568306914963489

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays,the rapid development of Android malware poses many threats to the security of the Android platform and exposes mobile users to huge risks of fraud and cyber attacks.Android malware detection has been a key research topic in the field of mobile security in recent years.However,there is a significant issue in machine learning-based malware detection that training data may contain noisy labels,which has a considerable impact on the performance of the detection model.This impact is becoming more severe as the size of the datasets used continues to grow.Actually,beyond Android malware,label noise is a common problem in machine learning datasets(e.g.,image datasets).There is a plethora of research and techniques to address noisy labels in academia.However,existing technologies present a number of challenges when migrating to the Android malware domain due to the complex composition of apps being fundamentally different from images.Currently,the problem of noisy labels faced by Android malware detection has not been effectively solved.To address this problem,this paper proposes a novel noise detection algorithm,and designs and implements a noise filtering system for Android malware detection based on it.The main work of this paper is as follows.(1)We propose a novel and effective noise detection method for Android malware detection.We first conduct a large-scale empirical study to reveal the unreliability of the commonly used malware labelling method in our research community.In response,after thorough research and exploration,we propose a noise detection algorithm based on confidence learning,ensemble learning and app relationship.We have migrated Confidence Learning,an advanced noise estimation technique,to the domain of Android malware.To mitigate the bias introduced by model itself,we incorporate the idea of ensemble learning to achieve more robust results.Further,we leverage app relations to improve the precision.(2)To evaluate the performance of the above method,we conduct a series of experiments from multiple perspectives.The experimental results show that our method can achieve excellent and stable performance in pinpointing noisy labels,i.e.,with an accuracy of over 94%and F1 of over 91%at varying noise ratios(5%-30%).In addition,compared to state-ofthe-art,our method achieves much better results(8%to 218%improvement)with significantly shorter time(70 to 249 times faster).We further show that the performance of existing malware detectors can be improved after removing noise by our method.These results demonstrate the effectiveness and feasibility of our approach to quickly and effectively reduce noisy samples in Android datasets.Meanwhile,in order to create a reliable and advanced dataset for experiments,we design and implement an automated malware collection tool that collects over 4K real malware samples from Android related security reports.This automated tool not only can greatly save labor and time in collecting data,but also is reusable and facilitate future updates and maintenance of the malware dataset.(3)We design and implement a complete noise filtering system for Android malware detection.The system is a generic framework to reduce the noise level of training data for the training of any machine learningbased Android malware detection.

Keywords/Search Tags:

android, malware detection, noisy labels, confident learning, ensemble learning

PDF Full Text Request

Related items

1	Research On Malware Detection And Classification Methods For Android
2	Android Malware Detection Based On Selective Ensemble Learning
3	Research On Deep Learning Methods When Learning With Noisy Labels
4	Research On Android Malicious Code Detection Based On Ensemble Learning
5	Research On Malware Static Detection Technology Based On Android
6	A Research Of Android Malware Detection Based On Ensemble Learning
7	Research On The Android Malware Detection Technology Based On Dynamic And Static Multi-Feature
8	Research And Implementation Of Android Malware Detection Algorithm Based On Ensemble Learning
9	Research On Federated Learning Methods With Noisy Labels And Imbalanced Data
10	Research On Learning With Noisy Labels Based On Label Distribution