Font Size: a A A

The Design And Implementation Of The SPAM-Evaluation System For A Social Networking Sites Based On Web Development

Posted on:2015-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z FengFull Text:PDF
GTID:2268330431956343Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the massive popularity of the network, there is a large number of domestic and social networking website platform spring up, which FaceBook, Weibo, Baidu Tieba, Tianya Community represented. These sites have in common is that they all belong to UGC (User Generated Content)-user-generated content website platform. Content on such sites primarily is generated by the user, and each user can generate their own content. By this way, these websites content will grow rapidly, and they will format of a multi-wide, designed situation. Therefore UGC websites play an important role in the accumulation and dissemination of knowledge.In the world’s largest Chinese community site-Baidu Tieba, which has hundreds of millions of registers; millions of forums; hundreds of millions of total daily topics; hundreds of millions monthly active users after over10years precipitation. Because it has such a huge number of users, and belongs to the UGC website, each user can generate their own personalized content, when spreading information, inevitably there will be a lot of unhealthy (pornographic, violent, reactionary) information, advertisements, fraud information which disgust the user, or against normal user rights, even constitute crimes. In the Internet industry, people call these junk content "spam"; those who publish these spam on the Internet is called "cheater"; If a normal user saw the spam when browsing the webpage, then we call the user’s browsing behavior" polluted".In order to find identify spam from millions of data fast and accurately, and ensure the normal users of the site can be more convenient to get their needs when using Web services, we need to get the proportion of spam on the site (spam rate), the extent of the impact on the normal user (contamination rate) and find the common of the spam and which types they belong to Timely and accurate. In this way we could timely and efficient clear the spam to reduce negative impact on the normal user, and improve the user’s experience with less manpower. At present, Tieba, for example, the artificial assessment of cheating rate would takes about1day of one person, including assessment data acquisition, human review data, manual calculation rate of cheating. The pollution rate is even unable to assess. To accomplish the common features (such as text ads, pictures, etc.) statistics and classifications of a spam, it would needs a few days more. This way of assessment is not only ineffective, and the accuracy of the results is not ideal. Engineers can’t timely and correctly master the cheating rate and characteristics of the new forms of spam because the artificial way cannot guarantee the timeliness, and therewith they cannot promptly and efficiently adjust strategies to clear the spam data. Even tallied up, cheats also has published a lot of spam to the site, and have caused irreparable damage on normal user rights. The spam might greatly affect the normal user’s experience, and even has the potential to cause a loss of users.In order to solve problems, such as long cycle of valuation, high cost, low accuracy, this paper implements a fast and accurate evaluation system for tieba SPAM based on WEB design. This system adopts B/S mode, using the MVC framework structure, and is designed and implemented integrated system of data extraction, evaluation function, statistical reports based on PHP+Mysql+Apache+Linux architecture.The evaluation system in this paper will greatly simplify the cumbersome process of artificial assessment, and shorten the cycle of the assessment from2days to2hours. The system It implements the evaluation data acquisition and statistical automation, and evaluation of each index can be automatically generated by the report. At the same time, the system ensures the accuracy of the evaluation results. It provides accurate comprehensive data support for engineers to timely master the SPAM situation, and generate strategies to clear up the spam.
Keywords/Search Tags:spam, evaluation system, spam rate, PV contamination rate
PDF Full Text Request
Related items