Font Size: a A A

Characterizing Android Apps With Typical Features For Malapp Detection

Posted on:2019-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1368330551958090Subject:Information security
Abstract/Summary:PDF Full Text Request
Driven by the wave of mobile internet,smartphones have penetrated every aspect of daily life.Now,the smartphones,which support the internet services and store user data,have been the vital links between the cyberspace and the physical world.However,smartphones also bring new security risks and privacy threats.An exigent problem is the outbreak and rapid growth of malicious applications(malapps).As the most popular smartphone operating system,Android is bearing the full brunt.How to detect and keep the malapps out of the app markets to protect the information and financial security of users is an ongoing challenge.This dissertation focuses on Android app behavior analysis and malapp detection.The point is to explore the discriminatory features and effective method for malapp detection.Based on the collected app sample sets from different sources(e.g.,from the official app market,the third-party app markets,anti-virus companies and research teams)and different time periods(a time span of two years),we propose and evaluate three malapp detection(or classification)approaches based on some typical Android features,such as permissions,11 types of static features and API vectors.Specifically,the main contributions of this work include the following parts.(1)We explore the permission-induced risk in Android apps in a systematic manner,and in depth discuss the feasibility as well as the limitations of malapp detection based on permission requests.Firstly,we thoroughly analyze the risk of an individual permission and the risk of a group of collaborative permissions.We employ three feature ranking methods,namely,mutual information,correlation coefficient,and t-test to rank Android individual permissions with respect to their risk.We then use sequential forward selection as well as principal component analysis to identify risky permission subsets.Secondly,we evaluate the usefulness of risky permissions for malapp detection with support vector machine,decision trees,and random forest.Thirdly,we in depth analyze the detection results and discuss the feasibility as well as the limitations of malapp detection based on permission requests.We evaluate the above methods on an official app set consisting of 310,926 benign apps and 4,868 real-world malapps.The empirical results show that our malapp detectors built on risky permissions give satisfied performance(a detection rate of 94.62%at a false positive rate of 0.6%).(2)We extract 11 types of static features consisting of 8 types of existing features and 3 types of new features from Android apps,and analyze their discriminatory powers and persistence on malapp detection.Specifically,we firstly extract a large number of features from each app and categorize the features into two groups,namely,app-specific features as well as platform-defined features.These feature sets will then be fed into four classifiers(i.e.,Logistic Regression,linear SVM,Decision Tree and Random Forest)for the detection of malapps.Secondly,we evaluate the persistence of app-specific and platform-defined features on classification performance with two data sets collected in different time periods.Thirdly,we comprehensively analyze the relevant features selected by Logistic Regression classifier to identify the contributions of each feature set.We conduct extensive experiments on a real-world app sets consisting of 217,619 benign apps collected from six app markets as well as Google Play market,and 18,363 malapps.The experimental results demonstrate the effectiveness of the proposed methods which yield the best true positive rate of 96%with a false positive rate of only 0.06%.(3)We propose a malapp family classification method based on API embeddings and convolutional neural networks(CNN).Firstly,we employ the word embedding technique from Natural Language Processing to convert Android API to a real vector,i.e.,the distributional representation of API.Secondly,the raw API call sequences are extracted and the apps are represented as a matrix of which each row is an API vector.Thirdly,we employ CNN model on the input matrices of app samples,which jointly learns appropriate features and performs malapp classification.The proposed methods are evaluated on a malapp set consisting of 13 families of 13,000 samples.The average detection rate is over 88%,which is better than that of the n-gram based method and of the mainstream anti-virus software.In summary,we explore the permission-induced risks and conduct a systematic study on the discriminative power and persistence of various types of static features extracted from Android APK files,and also propose a malapp family classification method based on API vector and convolutional neural networks.The proposed methods are all evaluated on large real-world app sets,which have practical significance on limiting the ever-increasing growth of malapps and protecting the security and privacy of users.
Keywords/Search Tags:Android Security, Malapp Detection, Permission-Induced Risk, CNN, API vector
PDF Full Text Request
Related items