Font Size: a A A

Research On BBS Content Supervision Technology Based On Active Search

Posted on:2012-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L Q GengFull Text:PDF
GTID:2218330368482639Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasingly popularity of the Internet, the Internet is becoming an indispensable information media. But at the same time, online information such as adverse reaction, proliferation of pornography and other content also will greatly influence the country's stability and people's health. Forum is used as an commonly Internet application form. It facilitates users greatly. At the same time, it is also facing the problem of harmful information. For a good network of culture and environment, forum content monitoring is necessary.In the realization, there are two ways of forum content regulation. They are active mode and passive mode. Active mode has its own advantages. For the problems faced by active mode, the paper mainly researches on the following two issues.Active mode uses Web crawler technology to obtain forum pages, in order to provide original content for regulation, but some forums require users to log in before they can view the content, Web crawler can only get the login page which is meaningless for content regulation. To solve this problem, this paper analyzes the user login process and presents a method based on the forum Cookies and Web crawler. It can get restricted page content from forums by using certificated Cookies in an automated way relatively. Experiments have proved the feasibility of the program.While the Web crawler is processing, duplicated URLs need to be removed quickly and efficiently in order to avoid downloading the same page repeatedly. Hashing is an important research direction. Based on K-Picked hash algorithm, this paper studied the theory and the lack of the original algorithm, proposed an improved scheme. By expanding the scope of ordinary characters, increasing the dispersion of the divisor and randomizing K discrete value, the improved algorithm has achieved a relatively good result which is proved by a series of experimental.
Keywords/Search Tags:content supervision, forum BBS, active mode, web crawler, duplicated URL removal
PDF Full Text Request
Related items