Font Size: a A A

Research On Key Techniques Of Internet Content Supervision System

Posted on:2006-12-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L DaiFull Text:PDF
GTID:1118360152965788Subject:Pattern recognition and artificial intelligence
Abstract/Summary:PDF Full Text Request
In the wake of deteriorating harmful information pollution of Internet, this paper delves into the key techniques of Internet content-supervision system. The main works in this paper are as follows:(1) The double-tiered supervision model is put forward. The high-speed multi-keywords matching algorithm and keyword-expression matching algrithm are adopted as the first tier, and the SVM based text categorization algorithm as the second tier. This model can improve the thruput as well as recognization precision of sensitive information.(2) An efficient multi-keywords matching algorithm named QMS is devised through greedily acquiring average shift distance. QMS is then combined with classical counting algorithm to improve the efficiency of keyword-expression matching algorithm.(3) The truth that feature selection methods that behave well in English text categorization are unsuitable for Chinese context is found. The reason of difference is analyzed and a new method named Combined Feature Selection is put forward. This new method is of great benefit to improve the classifying efficiency and accelerate the training of classifiers.(4) Anew algorithm of training SVMs named 3SAO is devised. 3SAO breaks the original QP problem of training a SVM into sequential sub-QP problems. Each sub-QP problem involves three Lagrange multipliers and is analytically optimized. 3 SAO also uses an effective but extremely simple set of heuristics for choosing multipliers. Test results prove that 3SAO converges very quickly.(5) A new text categorization algorithm based on knowledge fusion is suggested which is named Semantic Support Vector Machines (Semantic SVMs). Semantic SVMs replace the original training text set with Semantic centers as Support Vector candidates. While retain the precision, Semantic SVMs significantly accelerate the speed of training and classification, and also have good on-line learning ability.(6) A prototype of positive Internet content-supervision system is designed and implemented. Tests prove that the supervision efficiency and the reorganization precision of sensitive information are satisfying.
Keywords/Search Tags:Internet information pollution, content supervision, multi-keywords matching, keyword expression matching, text categorization, feature selection, Support Vector Machine, analytic optimization, Semantic SVM
PDF Full Text Request
Related items