Font Size: a A A

Time Synchronization Algorithm For Spider System Based On Hadoop

Posted on:2017-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:B DaiFull Text:PDF
GTID:2348330503480753Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet, the number of Chinese netizen have reached nearly 0.75 billion, accounting for 20% of the global. A netizen group this large will create zillions of information. The rapid development of network platforms and more frequent information exchange make it faster for people to exchange information with each other, and it also make the economy structure better and enable free speech. In recent years, new types of economy models like internet finance, online and offline mixed service and e-commercial have boosted the development of our society. But there is the other side of the coin, too much freedom leads to a result that it is too difficult to control cyber crime, especially gamble, porn and anti-government speeches. Thus, the government emphasizes on the benign development of the Internet. With the development of technologies, government begins its campaign on Internet monitering.In this very background, we focus on the algorithms of topic searching. First we introduced the background and the significance of this research, and reviewed the status quo. Next, we introduced the related technologies and theories, including Hadoop distributed system, the basis of network spider, complex network and time synchronization. Then we elaborated the spider system for sensitive data, and the basic data medium. Also we created and analyzed the models of complex network with complex network theories. After that we figured out the needs of time synchronization, and proposed an improved method of time synchronization. Finally we carried out a stimulation experiment for that method based on NS2.The research abandoned traditional methods of text analyzing, and adopted a new method focusing on time features. The experiment shows that our approach can improve the accuracy of time synchronization, thus better reflects the topic trend, which is vital to data monitoring.
Keywords/Search Tags:Sensitive information, Detection, Hadoop, Complex networks, TPSN
PDF Full Text Request
Related items