Font Size: a A A

The Network Users' Privacy Information Protection Model Based On Site Classification

Posted on:2013-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaFull Text:PDF
GTID:2248330395950797Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, Internet has become a critical part of our everyday life. In recent years, Internet privacy is one of the most important problems web users are facing. Various kinds of privacy abuses are reported. For example, users’personal information are sold between companies for profit. Browsing behavior is tracked to facilitate discriminatory advertising. All these issues show the lack of an efficient privacy protection system for web users. This article proposes such a privacy protection system based on website classification.The main points of innovation are as follows:1. HBWC Algorithm is proposed for Chinese website classification based on homepage features. The field of webpage classification has been intensively researched around the world. However, few researches focused on the classification of websites and related experiment data are scarce, especially for Chinese-Simplified websites. We deeply analyze the Chinese word segmentation, feature selection and vector space model of homepages and invent a new HTML tag weighting scheme which is different from that used for normal webpage classification. The experimental result shows the improvement of classification accuracy.2. Different classifier is chosen based on the time and space budget of both OpenID servers and browsers. KNN and SVM are used for the server side and the client side respectively. Together with the method of word segmentation, feature selection, vector space model and tag weighting, they comprise the HBWC-S and HBWC-C algorithms. Their classification accuracy reaches86.6%and89.7%under the same amount of training data.3. Based on HBWC algorithm, A privacy protection model is proposed to restrict the leak of users’attribute information and behavior information. The behavior protection model partitions the browser cookie store according to the category of first-party websites, thus preventing the tracking of third-party advertising companies across websites of different categories. The attribute protection model monitors both the browser form submission and OpenID attribute exchange protocol, detecting potential privacy leaks when websites require personal information that is incompatible with its nature. The user is then alerted for confirmation.4. An extension of OpenID protocol is proposed to integrate HBWC of server and client side. With minor user involvement, this communication scheme can attain94.9%overall classification accuracy of Chinese websites.
Keywords/Search Tags:Privacy Protection, Machine Learning, Website Classification, Tag Weighting, OpenID Protocol
PDF Full Text Request
Related items