Font Size: a A A

Design And Construction Of Distributed JS Parsing System

Posted on:2015-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:W HuangFull Text:PDF
GTID:2268330425976167Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, webpages are becoming more and more beautiful. Webpage programming technology includes not only HTML and CSS, but also kinds of dynamic script languages. JavaScript, as the representative of script languages, is very powerful. But JavaScript programming is more complex than usual static webpage technology. In the field of search engine and information acquisition, it’s very difficult to get the information hidden in the scripts.So the purpose of this paper is to design and construct a system which can distributed parse JavaScript hidden in HTML.This paper has two major research directions:First, to design an algorithm which can extract JS scripts in HTMLs and parse them; Second, to analyse task scheduling algorithm and design a task scheduling algorithm of this system by using Hadoop distributed computing technology. Through the research of the rules of JavaScript grammer and its existence in HTMLs, this paper designed the process and algorithm of JavaScript extraction based on JavaScript parsing engine. This is the first module. Through doing research on the Map/Reduce task scheduling algorithm, according to the characteristics of JavaScript parsing task and distributed environment, this paper also figured out the most suitable Map/Reduce task scheduling algorithm for the system to support reasonable operation of JavaScript parsing task. And then a distributed JS parsing system was constructed. In order to check the accuracy and the performance of the system, this paper had a test on the system, summarizes deficiencies and suggests improvements at last.The distributed JS parsing system can parse a large number of JS scripts in HTMLs efficiently and quickly. The experimental results show that this system can extract the text messages and urls contained in JS scripts efficiently. Thereby the research of this paper can provide a reliable technical support for the field of search engine, public opinion analysis and data acquisition.
Keywords/Search Tags:JavaScript parsing, Hadoop, Map/Reduce task scheduling algorithm, JavaScript parsing engine
PDF Full Text Request
Related items