| Mobile smart devices and mobile Internet have already greatly changed people’s daily life.Tasks previously needed to be done on multiple devices,such as message sending,web browsing,photo taking,shopping,payment,etc.,now can be done with a smart handheld device such as a smartphone.The concentration of functions makes smartphones generate and store a large amount of user privacy information.Although there are various smartphone operating systems,Android has rapidly dominated the mar?ket share with its openness.However,due to the openness of Android,the privacy threats caused by Android applications(apps)are particularly serious.It is indicated that the data that Android apps can obtain can be divided into three categories:system data protected by the permission mechanism,such as SMS,location,etc.;system data that is not protected by the permission mechanism,such as memory and CPU usage percentage,embedded sensor data,etc.;app data generated by the app itself,such as the user’s in-app action data.When Android apps access this data,there may be some privacy leaks that threaten user privacy.Some of these behaviors have already attracted the attention of researchers.For example,for system data protected by the per-mission mechanism,malicious apps can steal users’ sensitive information by applying corresponding permissions;for embedded sensor data that is not protected by the permis-sion mechanism,elaborately designed apps can use machine learning technology to infer users’ sensitive information through the sensor data.There are also some behaviors that have not attracted researchers’ attention.For example,for the user’s in-app action data generated by the app itself,a third-party mobile analytics library may disclose the user’s sensitive information to the third party when it is used to collect the user’s in-app action data.For the privacy threats caused by Android apps described above,existing studies have proposed some analysis and mitigation methods,but these studies still have some deficiencies.For the Android malicious app detection,the proposed permission-based detection methods lack the consideration of the permissions used by the app.For the privacy threats caused by the embedded sensors on Android platforms,existing studies have proved that embedded sensors can be exploited by malicious apps to infer users’ pri-vate information,and some researchers have proposed corresponding sensor data access control mechanisms.However,due to the lack of analysis of the sensor usage patterns of existing apps,these mechanisms all rely on the user’s own judgment,and cannot auto-matically generate appropriate access control policies for different apps.For the privacy threats caused by the third-party mobile analytics libraries,there is a lack of research on what personal information can be leaked by the user’s in-app behavior data collected by the third-party mobile analytics libraries and how to prevent this type of information leakage.Therefore,this dissertation takes Android apps as the research objects and studies the following aspects for the purpose of protecting users’ private information.First,we propose a two-layer permission-based Android malicious app detection method that takes the permissions used by apps into consideration.Second,we propose a method for gener-ating sensor data propagation graphs of Android apps,and analyze existing apps’ sensor usage patterns based on the generated sensor data propagation graphs.Based on this work,we design and implement an Android app’s sensor data access control mechanism that can automatically generate access control policies.Finally,we analyze what infor-mation is collected by the third-party mobile analytics libraries in the popular Android apps and its impact on user privacy with a combination of static analysis and dynamic analysis.Based on this work,we design and implement a management mechanism to manage third-party mobile analytics libraries.In summary,we make the following contributions:(1)We propose a two-layer permission-based Android malicious app detection method which considers the permissions used by apps.In the first layer of this method,permissions requested by apps and permission pairs requested by apps are used respec-tively to train classifiers and classify apps.According to the classification results of these two classifiers,the apps are classified into benign set,malicious set and uncertain set.In the second layer of this method,permissions requested by apps and permissions used by apps are combined in several ways to classify the apps in the uncertain set.The best classification results of the second layer are merged with the classification results of the first layer to form the final classification results.The proposed detection method is eval-uated on an app dataset consisting of 19,369 benign apps and 8,694 real-world malicious apps.The empirical results show that this detection method achieves a true positive rate of 89.07%and an accuracy rate of 95.09%with a false positive rate of 2.21%.This de-tection result is better than that generated by only considering the permissions requested by apps.(2)We analyze the sensor usage patterns of existing Android apps from the per-spective of an entire Android market.For this purpose,we design and implement an analysis tool called SDFDroid(Sensor Data Flow Droid).We work out a Smali code oriented sensor data taint propagation policy after an in-depth studying of the Dalvik in-struction format.Based on the sensor data taint propagation policy,SDFDroid performs both forward and backward sensor data taint analysis on apps’ Smali code generated by decompiling the apps,to generate the apps’ sensor data propagation graphs and identify the types of the sensors used by the apps.According to the apps’ sensor data propaga-tion graphs,we analyze whether the apps have leaked the sensor data.We propose an algorithm named NHGK-DBSCAN to cluster the apps’sensor data propagation graph-s.This algorithm employs NHGK algorithm to calculate the similarity between each app’s sensor data propagation graph and clusters apps’ sensor data propagation graphs with DBSCAN algorithm.According to the clustering results,we summarize existing apps’ sensor usage patterns.With SDFDroid,we analyze 22,010 apps from a Chinese mainstream app market called AppChina,7,601 apps from Google Play and 4,644 apps from another Chinese app market called AnZhi.The analysis results show that excluding some ad libraries that send sensor data to web pages for displaying ads,existing Android apps only use sensor data locally;running games and displaying ads are the two main purposes of using sensors for Android apps;apps usually use sensors through third-party libraries;third-party libraries’sensor data propagation graphs almost keep unchanged over a period of time;different categories of apps usually use different types of sensors;accelerometer is the most frequently used sensor;although most apps only use one type of sensor,there are some third-party libraries registering 11 types of sensors.In order to control an app’s access to sensor data and mitigate the privacy threats caused by the em-bedded sensors,we design and implement a sensor data access control mechanism named SensorDataGuardian.Given an app,SensorDataGuardian uses the analysis results from SDFDroid to automatically configure the app’s sensor data access control policy.Sensor-DataGuardian refines the sensor data access control objects from apps to the third-party libraries used by the apps.The test results show that the performance overhead brought by SensorDataGuardian is acceptable to users.(3)We study the user privacy threats brought by the third-party mobile analytics libraries integrated with popular Android apps.We design and implement an analysis tool called Alde.Given an app,Alde employs both static analysis and dynamic analysis to extract the user’s in-app action data collected by the third-party mobile analytics libraries.In the analysis process,we propose a method to identify the obfuscated APIs based on the APIs’ method call graphs and an app dynamic running method based on the priority of app’s GUI elements,which improve the coverage of the analysis.The user’s in-app action data extracted by Alde is then analyzed manually to identify what private information is leaked.With Alde,we have analyzed 8 widely used analytics libraries as well as 200 apps from Chinese app markets and 100 apps from Google Play.The experimental results show that analytics libraries can be exploited by malicious developers to collect user’s personal information directly;some apps indeed leak user’s personal information to analytics companies even though their genuine purposes of using analytics libraries are legal;users will be deeply profiled if an analytics company links the information collected from different apps,especially in China;developers seldom describe the use of analytics libraries in their apps’privacy policies.To mitigate the user privacy leakage caused by the third-party mobile analytics libraries,we design and implement a mechanism named ALManager to manage the third-party mobile analytics libraries in other apps.The test results show that the performance overhead brought by ALManager is acceptable to the users. |