Font Size: a A A

Research On Key Technologies Of Homologous Analysis Of Mobile Applications

Posted on:2021-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C P ZhangFull Text:PDF
GTID:1368330605981237Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile networks and communication technologies,the mobile Internet has been fully integrated into the daily lives of people.However,with the increasing popularity of mobile smart terminals,the number of mobile applications has also exploded.Among them,the Android system still plays a major role in the current market share of mobile intelligent operating systems.Due to the openness of the Android system and the growth of the open source community,code reuse is widely used in the development of Android applications.Code reuse can not only facilitate the creators to develop their own Apps with less energy,less resource and help them reduce the redundancy by taking the advantage of the existing codes,but also can facilitate big companies to collect the users' data by making their code reusable for more and more public developers,which can maximize the profits of both sides.Moreover,due to numerous necessities of the market and the open Operating System of Android Apps,code reuse is prevalent in the development of Android Apps.Understanding the overall code reuse situation in the Android App Store today and analyzing different reasons for code reuse can better help the developers optimize their development process,and can better promote the development of the related research of mobile application analysis.However,with the increasing popularity of mobile intelligent terminal equipment,the accompanying security issues have become increasingly serious.Malwares mostly perform malicious acts such as privacy theft,system destruction,terminal control,and malicious chargeback without the user's awareness,which brings direct economic losses and potential privacy leakage issues to the users.It is a good way for malware writers to broadcast their malicious code through code reuse.Reusing code written by the developers themselves can easily lead to possible risks in the code that may be vulnerable in the applications that are subsequently released.At the same time,owing to the open nature of the Android ecosystem,Android applications are vulnerable in cracking and tampering.App repackage(App clone)has become a major threat to the Android ecosystem.Malicious developers can easily replace Ads for legitimate applications and upload them to official stores or various third-party stores.Then they use some false information of description to disguise their published malicious application to achieve the purpose of deceiving users to download.Therefore,by detecting and analyzing the inconsistencies in description and behavior of the application program,malicious and abnormal applications in the App store can be effectively identified.With the continuous improvement of advanced code mutation technology,unscrupulous developers can easily create a large number of malware or forged App variants by using repackaging technology.In order to better solve this problem,we need a method that can detect the developer of Apps,and then curb the spread of malware or forged Apps from the source.Technical analysis of homology related to mobile applications can be used to identify or classify the developer of Apps,and then distinguish the original authors from plagiarism.And the results of its analysis help predict both the types of tools and technologies used by a particular malicious code author,as well as study how the malware spreads and evolves.In recent years,a number of research results on author identification of mobile applications have been released.However,there still remains one open challenge that has not been addressed by previous works.Although state-of-the-art techniques could accurately detect App clone pairs,it is non-trivial for them to pinpoint the authorship of the App clones automatically,which greatly limits the usage scenario of App clone detection techniques.Thus,it is critical to identifying the research topic of the original authors.Therefore,in order to perform homology-related analysis research on mobile applications,it is necessary to use program analysis techniques as the basis,such as code decompilation,static analysis,machine learning,and natural language processing.Complemented by third-party library detection,repackaging application detection and other mobile application-specific analysis techniques,the related research on basic code clone detection and inconsistency detection are performed to complete the author attribution of Android Apps.The main research contents and innovations of this dissertation include:1)Depending on an in-depth study of the overall situation of code reuse in the official Android App store,we propose a code clone detection method for Android applications.And we establish a mobile application analysis model based on multi-polarity feature descriptions,and build a code reuse database for more than 400,000 mobile applications.In order to complement the existing code reuse research work,this dissertation proposes a multi-level mobile application code reuse analysis research framework which is suitable for large-scale analysis.This dissertation studies the use of code multiplexing at three different coding levels,and summarizes four code multiplexing-related list information.At the same time,the reasons for code reuse in different coding levels are analyzed in detail,and the relationship between package-level and class-level code reuse is discussed in depth.We also analyze the code reuse of different types of developers,and classify the developers according to the category and the number of Apps that the developers have released.2)Using Natural Language Processing(NLP)-based description extraction technology and Application Program Interface(API)-based behavior feature technology,we check App behavior against App description in the context of third-party libraries.First we use Latent Dirichlet allocation(LDA)model to identify 30 topics from a large number of application description files.Then according to the degree of relevance of the application to each topic,we use the optimized K-means++algorithm to build a more efficient model to cluster the Apps into 29 clusters according to the functionality they belong.Finally,we use Isolated Forest(IF)algorithm to perform anomaly detection on Apps in different clusters.On the basis of the extensive experiment on nearly 300,000 Apps,we show that more than half of the Apps are no longer identified as outliers after being filtered by TPLs,and we could identify more new outliers.3)This dissertation proposes an innovative method based on LightGBM machine learning framework to identify App clones from the original authors,and evaluates the effectiveness of the method in all aspects through evaluation experiments in various situations.And our study is a pioneer work dedicated to tackling the challenge of authorship attribution for repackaged Apps in a sysytematic way.Besides,our approach could achieve over 80%of accuracy when predicting the authorship of a given App among hundreds of developers.We also compare the importance of different features for authorship attribution.
Keywords/Search Tags:Android Apps, software homology, code reuse analyzation, abnormal App detection, code authorship attribution
PDF Full Text Request
Related items