Font Size: a A A

The Research And Implementation Of Information Assets Based On Big Data Analysis Of WEB Information

Posted on:2018-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:2348330512496674Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of web technology and construction of digital campus,web and other information resources of campus network showed characteristics of large quantity,large scale,complex structure and fastly dynamic change.So that campus information asset managers were dazzled.Managing complex campus network information resources and monitoring the running state of them became very complicated.Currently,most of available asset management systems of campus network monitored information assets through the way of artificial active monitor and then manual configuration and import data.So it made the monitor have lagged behind.This can't reflect real-time status of campus network information assets timely,accurately and macroscopically.In this paper,from the view of web mining and large data analysis,I researched the multi-thread crawling technology of web information collection,scheduling technology of thread pool and technology of IP scanning or port scanning.What's more,I studied information preprocessing technology of delete large scale URL repetition algorithm of MD5,Web page removal algorithm based on Simhash,page parsing technique based on DOM and detection technology of malicious URL and dark chain,vulnerability and other malicious behavior of the feature matching algorithm.Then these techniques and algorithms were applied to project,and campus network information asset management system based on web information is realized.The system is divided into two layers.The first layer is information acquisition and pre-processing.It started from an application layer URL link,through the web network crawler and SNMP network detection and scan technology,got all of the information assets within the campus web page information,IP,domain,web server and port information.Then,we got rid of the noise and remove duplicates of the collected data,And then integrated and stored the valuable information assets.The second layer is visual presentation layer of campus network information asset,which can extract the valuable multi-dimensional data according to our demand,and it can provide functions of information query and management.It realized modules of campus network web link hierarchical information management,hardware server information management,server system information management,site running state monitoring and campus network security management.The system can monitor running status of campus network information assets concisely and dynamically.Such as monitoring the state of campus network server and the use of various types of campus network,it can check whether there are dark chain,vulnerability or malicious code and other security risks.In process of system design and implementation,system requirement analysis was first carried out and the functions of the system were abstracted.Then on the basis of the demand analysis,the overall architecture and function modules of the system were designed,and each module was designed and implemented in detail.Finally,the result of system implementation was showed and system testing was did.What's more,the operating result was analysed and summary was made.
Keywords/Search Tags:Big Data Analysis, Web Mining, Web Crawler, Information Asset Management System, Big Data Visualization
PDF Full Text Request
Related items