Font Size: a A A

Design And Implementation Of Microblogging Acquisition System Based On Page Predictive

Posted on:2016-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:F L MengFull Text:PDF
GTID:2208330470950833Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today’s society is in an era of information explosion, the rapid development of networkinformation, people can whenever and wherever possible publication network news network,obviously already and our life blend together, is affecting our life and changing our life style.Complex information online and in micro-blog spread most rapidly, how to accurate andreal-time capture micro-blog information, become a big problem faced by people. Therefore, themicro-blog information on the network monitoring, acquisition, preprocessing and relatedinformation collection has become a research hotspot of information processing.Analysis of the micro-blog acquisition system at home and abroad advanced and the studyof this article, through a large number of academic articles research and test on the relatedtechnology micro-blog collection, including the page denoising technology, Webpage presentence processing, crawler technology, normalization, regular expression technology. Thispaper developed a micro-blog information acquisition system based on micro-blog anticipation,the micro-blog acquisition system based on C#language, SqlServer2005database, Sinaentertainment, Sina, Sina, sina sports broke the news, movies and Sina Sina literature sensibilityacquisition channel. Compared micro-blog information acquisition system and its single, thissystem has significant bit, it can according to user requirements according to the theme of fuzzyquery and batch acquisition, thus making the system not limited to add channel acquisition. Thedevelopment of this system using the SqlServer2005database, a database named Microblogtag,the main database tables: microblogs table, microblogstxt table, microblogsback table andAdminstate table.The main development of the system into four modules respectively for the login interfacesystem, data acquisition module, data channel subject acquisition module, data import and exportmodule. This paper describes in detail the design and implementation of micro-blog channelacquisition module, data acquisition module and the theme of import and export module. Thecore of this system for channel acquisition modules and themes collection module, two modulescan be used to collect information micro-blog, hand can automatically collect according to thetheme of collecting user input, on the other hand, in the case of user needs, the module can alsoachieve the data for import and export. To increase the flexibility of the system.This paper also to the theme of acquisition as an example, test shows that the system caneffectively avoid the changes and the changes of the content of Hash Webpage values deviatefrom the phenomenon, solve the network crawler virtual login many times to URL acquisitionauthentication problem caused by. Experiments show that the method can real-time, fast acquisition micro-blog information, provide accurate data for the mass public opinion dataanalysis.
Keywords/Search Tags:Micro-blog acquisition system, Web crawler, Data capture, Webpage anticipation
PDF Full Text Request
Related items