Font Size: a A A

User Web Information Collection And Analysis System Based On The Smart Router

Posted on:2017-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:S J PengFull Text:PDF
GTID:2308330488952024Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The arrival of the information age enables the Internet to become the most important information source for both individuals and families. With the abundance of multimedia information and the maturity of wireless network technology, more users are connecting to the Internet via a variety of intelligent terminal equipments, which becomes the mainstream to getting information in modern society. All kinds of convenient service softwares based on the Internet then come to being, and the development of big data techniques makes the Internet companies realize that user information is a strategic resource with extremely high economic value. So the extraction of user Internet information and the analysis of Internet behaviors is of great significance to both promoting the academic research and maintaining the customer resources of enterprises.Currently, the main data source for analyzing user online behaviors is the server logs and the browser cookies. The former is generated according to a given format based on the relevant user online behaviors when users are logging into target websites while the latter is formulated by transmitting user’s online information to the background server through script technology on the website. However, the above two methods are limited to specific websites and it is ideal for us to be always able to obtain information about Internet behaviors even though users are browsing different websites. As the center of the home network-link and data distribution, routers play a vital role in home networking.Taking full use of routers, this paper designs and realizes a user web information collection and analysis system based on the smart router, which overcomes the limitation of ways to obtain user information and the one-sidedness of information collection. This system is divided into two parts, the gateway and the background server. The former is responsible for the extraction and transmission of user’s ID and URL, and based on information collected from gateway the latter realizes the main functions such as the text extraction of the corresponding Web, the statistics of the page browning time, the sub-links crawling and correlation calculation, and the text topic classification. The innovation mainly includes the following five aspects:(1) Based on the analysis of the system environmental requirements and application scenarios, this paper puts forward a text extraction algorithm combining text density with multi-features. The algorithm not only improves the extraction speed, but also ensures the accuracy of the Web extraction.(2) The paper proposed a TF-IDF text keywords extraction algorithm in combination with statistical analysis, structure analysis and linguistic analysis. Since the algorithm takes into account the impact on the performance caused by the word’s length and word’s span, it can overcome the defect of traditional algorithm to be totally dependent on words frequency statistics(3) A theme crawling strategy is designed based on the proposed text keywords extraction algorithm and the VSM text similarity measurement principle, and it achieves the sub-links crawling and correlation calculation of two level pages.(4) A Bayes text classification algorithm which emphasizes the relationship between categories and features is proposed, and the algorithm can improves the accuracy of text classification.(5) This paper proposed an overall design of Web user information collection and analysis system. Based on the smart routing platform of OpenWrt system, the system function is realized through programming. Testing results show that the system has a good performance and can meet the expected design requirements.
Keywords/Search Tags:web information collection, keywords extraction, web spider, feature extraction, text classification
PDF Full Text Request
Related items