The Research And Implementation Of Web Malware Detection Based On Page Content

Posted on:2012-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:W Wei

Full Text:PDF

GTID:2218330362456566

Subject:Information security

Abstract/Summary:

PDF Full Text Request

In recent years, malwares, including worms, Trojans and botnets, always threat to Internet security. As the growing popularity of WEB2.0 and cloud computing, more and more applications provide WEB-based services, there have been trends of browser OS. Exploiting browser vulnerabilities and plug-in vulnerabilities has replaced exploiting vulnerabilities in operating systems and applications. Web malicious code has become the main way of attack and spreading of malware and an important part of the underground economy. Malicious web page is the page that contains malicious content which spreads virus, Trojan, etc. Included malicious content, which is always called web Trojan, essentially is not a Trojan. It's the malicious code spreading by webpage, generally written in JavaScript, VBScript or other scripting language, usually obfuscated in various ways to escape detection. By exploiting vulnerabilities in browsers or plug-ins, webpage malicious code can download and run malware, such as adware, Trojan, viruses, etc. Users could be attacked even when they visit a seemingly benign website since benign web page could have been injected with malicious code. Various tactics are used in order to evade detection by AV scanner, for example, encryption and polymorphism. Traditional detection system has a high false negative rate. Therefore, more and more attackers utilize Internet to spread malware. Detections techniques are usually classified into static detection (based on page content or URL), dynamic detection (based on browsing behavior), and a combination of both. Traditional static detection method is simple, but difficult to deal with code obfuscation, which lead to a high false negative rate and false positive rate. Therefore, many existing systems use the dynamic detection approach, that is, run scripts of a webpage in a real browser in a virtual machine environment, monitor the execution for malicious activity. While the system is quite accurate, the process is costly, requiring seconds for a single page without optimization, thus, is unable to be performed on a large set of web pages.In this paper, a light-weighted detection system was proposed. The system analyzes pages, extraction features, automatically derive detection models using machine-learning techniques. In addition, we use JavaScript virtual machine to make further analysis for the obfuscation code to make a complementary for static analysis, which detect the source code of a web page statically. As most of the analysis process only use the source of the page, without the need for execution and, therefore, consumes less resources and can be applied to large-scale web pages'detection, for example, integration with the search engine. We analyze the characteristic of a malicious webpage systematically and present important features for machine learning. In the end, we describe the system's design and implementation and demonstrate the effectiveness of the system by experimental results.

Keywords/Search Tags:

web malicious code, drive-by download, static detection, dynamic detection, machine learning

PDF Full Text Request

Related items

1	Detection And Prevention Of Malicious Websites
2	Anomaly Detection Of JavaScript-based Malicious Web Pages
3	Research On Drive-by Download Detection Based On Machine Learning
4	Research And Implementation Of Obfuscated Drive By Download Attack Detection Technology
5	JavaScript Malicious Code Detection System
6	Research Of Bootkit Static Detection Method Based On Disk Data Search
7	Research On Malicious Code Detection Technology Based On Android Platform
8	Malicious Code Detection Technology Based On Machine Learning Algorithm
9	Research And Implenmentation Of Malicious Web Page Code Detection Technology
10	Research On Android Malicious Application Detection Method Based On Dynamic And Static Combination