With Android system becoming the most used mobile operating system in the world,Android applications bring great convenience to people.They have become an important part of ecosystem.Repackaging applications are simple to develop.And their appearance and behavior highly consistent with the original applications.Many attackers trick users to download and install them for some purposes,such as stealing information of users and profit of developers.At present,the researchers detect repackage application from code,interface and resources,but great challenges still affect the accuracy and efficiency of detection,such as the huge application scale and the continuous development of confusing and packing technology.This thesis focuses on the efficiency of repackaging detection,scalability to confusing and packing technology,which carries out the research of repackaging detection.It proposes a two-phase repackaging detection method based on icon and layout files and implements the prototype of the detection engine——ReHunter.The method solves the problem of detection based on layout that layout repeat matching and element location affecting feature accuracy.The engine includes two parts.The first part is fast filtering of large-scale applications,and the second part is accurate detection of repackaging applications.The main work is as follows:(1)A fast filtering scheme for suspicious repackaging applications based on icon and layout files.This thesis extracts the fast application feature through icon and layout files.The problem will be transformed into finding similar vectors in the eigenvector set by the navigating spread-out graph for approximate nearest neighbor search algorithm which filters suspicious repackaging at large-scale applications.It solves the problem of long detection time at large-scale applications.The fast application feature consists of icon and application feature vectors.And the application feature vector is generated by layout feature vectors.The experimental results show that the ReHunter fast filtering scheme has the lowest false positive rate and the shortest time with AndroGuard,SimiDroid and FSquaDRA2.(2)Third-party library filtering method based on the layout Hash.This thesis generates the layout Hash by layout feature vectors.The problem will be transformed into a third-party library layout which queries if there is a lot of the same layout Hash,because of widely used third-party libraries.The experimental results show that the filtering scheme is only slightly lower than LIBID in accuracy,but it can detect template layout.And It has the shortest time and the most powerful scalability with Orlis、LibScout、LibPecker and LIBID.(3)Dynamic threshold algorithm.This thesis proposes a dynamic threshold algorithm based on maximum interval according to the characteristics of similarity data distribution in repackaging detection,which solves the problem of wide similarity domain and can’t determine the appropriate fixed threshold.The experimental results show that dynamic threshold algorithm can guarantee high recall and filtering rate compared to fixed threshold.(4)A high-precision detection scheme for repackaging application based on layout files.This thesis proposes a scheme to generate the layout fingerprint and calculate similarity based on weights of layout fingerprint.The scheme eliminates the effect of the location of layout element in the similarity comparison.It avoids the problem is same which the element location is different and render effect.The application similarity comparison is also proposed,which eliminates the effect of layout repeated match and resource confusion techniques.The experimental results show that the ReHunter high-accuracy detection scheme has the lowest false negative rate and the most powerful scalability to confusing and packing technology with AndroGuard,SimiDroid,and FSquaDRA2. |