Font Size: a A A

Research On The Large Scale Anti-Phishing Detection Engine

Posted on:2013-11-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:1228330374999776Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Phishing is a significant form of criminally fraudulent attempt in electronic communication. It is not a simple fraudulent activity, which obtains the user password by masquerades as a trusted email or website. Behind its every action are the complex phishing black chain and the social engineering to manipulate people into divulging confidential information. Since the first recorded attacks that happened in1987, phishing attacks have already became the most dangerous fraud form in today’s online trading and e-commerce. It is not only a serious threat to the security of financial transactions, but also leads to long-term impact when these individual user privacy data were disseminated. On the other hand, in order to achieve the maximum benefits, the phishing attacks in different domains cooperate more closely with each other and the attack cycles speed up notably. Phishing attacks are becoming the cornerstone of Internet criminal activities due to the "low threshold, high-yield" characteristics. In recent years, the phishers have already broken the balance between the attacks and the defenses by the buckets effects of the deals in online business. Dynamic password is easy to bypass, SMS verification is overcome and PKI private key has also been stolen. Hence, rely solely on the internal technical strength of the enterprises or companies that have been masqueraded is not enough. The increasingly complex ways of defense, moreover, make the contradiction between safety and convenience more intense. That is, the cumbersome authentication mechanisms gradually achieve the user’s tolerance limits. With the development of e-commerce, the active online banking users are increased by300million. The huge economic benefits will stimulate the phishers to develop more online frauds that will make the network transaction securities more critical.In order to solve the phishing problems and implement a simple, secure transaction process, we should do more research work on the field of network traffic layer. Utilizing the efficient and reliable phishing detection method, we will completely cut off the black hand that stretched the network transactions.This paper mainly focuses on the phishing URLs identification and pages detection. That is, classifying the phishing attacks entirely based on the inbound URLs and recognizing the phishing targets based on the pages and samples in the repositories. In the following, we summarize this dissertation’s contributions to the problem of detecting phishing attacks:1. We introduce a multi-platform desktop-level anti-phishing evaluation model and first reveal many potential problems in the phishing detection fields. Utilizing this evaluation model, we are the first work that established an automated testing method on multiple operating systems to estimate the performance of different anti-phishing technologies. By multiple live phishing URL feeds, four experiments on ten anti-phishing tools including browsers and browser plug-ins have been conducted. And based on these evaluations, we propose some new problems that these mechanisms must be faced, and introduce these findings for the future anti-phishing design2. We introduce a large-scale multi-domains phishing URLs detection model. We reveal some new aspects of the common features that appeared in the phishing URLs, and introduce a statistical machine learning classifier to detect the phishing sites that relies on these selected features. We use this detection strategy as a real-time and nonlanguage related anti-phishing solution for our integrated detection engine that partly maintains the blacklist of some security vendors. Unlike previous studies, we do not utilize a single model for different regions since the result of our analysis shows that the features in different phishing domains have mismatched distributions. As it is impossible for us to recollect enough data and rebuild the models, we adjust the existing model by the transfer learning algorithm to solve these problems. A number of comprehensive experiments show that our proposed method achieves a high accuracy over a balanced dataset and small error rates in the simulated real phishing scene. Moreover, the well performance in the target domain demonstrates the use of transfer learning algorithm in the anti-phishing scenario is feasible.3. We introduce a fast phishing pages classification method that based on the locality sensitive hashing (LSH), and present page verification and brand labeling model that based on our proposed classification method. After analyzing and recursion the locality sensitive hashing algorithms, we propose our optimization ideas that based on the random projection of the LSH. By using the optimization factor, we introduce the location information of the page features into the projection of the hyperplane as the way of distance adjustment. Based on this optimization idea, we have successfully realized the phishing pages classification method. By performing a sliding window to extract the page texts’information and the DOM architectures, we calculate the fingerprints of the target page by our proposed optimization model and compare it with massive sample records to solve the phishing detection and branding task. The comprehensive experiments demonstrate that, our proposed method has better identification abilities and achieves satisfying results.4. We introduce a semantic understanding method for automatically detecting phishing pages. In order to distinguish the newborn phishing attacks and cope with the semantic attacks launched by the phishers, we present a novel method to enhance our template expansion abilities. According to the linguistic characteristics that appeared in the phishing pages, we design the phishing domain ontology and its corresponding description model that transform the comparison of the texts in the pages to the text understanding. After a series of experiments, the results demonstrate that our proposed method could achieve a high accuracy with a satisfied processing speed in the phishing identification and therefore prove that a phishing attacks could be detected in the semantic level.5. A large-scale architecture for anti-phishing detection has been introduced. By integrating the research that proposed before, we design this large-scale detection engine in active use. With the description of the design details and experimental evaluation, the operating statistics data illustrate the reliability and efficiency of our proposed engine in the actual application environment, and in addition, fully affirms the values of our research work.
Keywords/Search Tags:user privacy protection, phishing detection, url feature recognition, transfer learning, page similarity detection, semantic understanding
PDF Full Text Request
Related items