Font Size: a A A

The Research On The Application Of Rough Set Theory In The Web Information Filtering

Posted on:2006-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:A H ZhuFull Text:PDF
GTID:2168360155465844Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The development of Internet promotes the information communication greatly. People may share the rich network information, may find many kinds of information on the network. But at the same time, pornography, violence, evil religions and other harmful information become more and more rampant on the network, especially the pornography contents are flooding. Then how to prevent the unhealthy information from broadcasting on the network had become the research hotspot of network security technique. Web contents analysis filtering distinguishes the web by analyzing the web contents the subscribers browsed synthetically. By using this technique we can receive higher contents distinguish accurate rate, and avoid the weakness of database distinguish method, not needing renew the database frequently. Today the main problem of web contents analysis distinguish filtering is how to improve the filtering speed and practicability on condition that the accuracy is satisfied, in the meanwhile, which is one of the key techniques that should be solved in network information security area.The Web information filtering technique based on rule is easy to understand and filter information quickly. It is fit for handling large quantities of text filtering. In this paper, we put forward a Web information filtering technique that getting rules base on the Rough set theory. After the deeply research of Rough set theory, we elucidate that Rough set theory is an mathematic tool in dealing with vague and uncertain knowledge discovery. Based on this, we deeply studied the contiguous attributes' discretization algorithm and some attributes reduction algorithms based on Rough set Theory.About the attributes' discretization, we mainly discussed the Naive Scaler alogorithm, Semi Naive Scaler algorithm, and the logical operation combined with Rough set theory algorithm. We also make comparison of the three discretization algorithms. After comparison, we found that different data set need use proper discretization algorithm. If we use different method, It will lead to greatly difference inthe attributes reduction.The research of attributes reduction algorithm based on Rough set theory is the paper's important point. In the paper, we mainly discussed the reduction algorithms including: algorithm based on discernibility matrix and logic operation, backtracking logic discernibility matrix reduction algorithm, and the improved heuristic reduction method. Then discussed each algorithms' advantage and disadvantages. Among them, the backtracking logic discernibility matrix reduction algorithm and the improved heuristic reduction method are improved by myself based on the basic method.Applying the improved heuristic reduction algorithm in the Web information filtering technique is the paper's creative point. First we put forward the Rough set model of Web information system, then got the discretization attribute values by the discretization processing model, lastly, reduced the attributes and got the decision rules. We both give the detail flow process diagrams of the discretization process and rules obtaining. Furthermore, in the process of drawing the Web information features, we considered the website layout and the PICS assess grades, which is also the novelty in my paper.After testing, we found that the application of Rough set theory in Web information filtering is effective. The rules obtained by this method is prone to comprehend and improved the speed and practicability of information filtering analysis.Finally, I sum up the whole paper and point out the places which should be improved. Then specified the direction of the next work.
Keywords/Search Tags:Rough set, reduction of attributes, reduction of values, discretization, discernibility matrix, decision rules.
PDF Full Text Request
Related items