Font Size: a A A

Topic Model Based Safe Searchfiltering Techniques

Posted on:2020-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:G J ZhouFull Text:PDF
GTID:2428330590496399Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Network security as an important part of information security,the security inspection and evaluation of the related information has naturally become an important demand,and has played an increasingly important role in maintaining security and social stability.After experiencing the network harm like network fraud,vulgar information,rumors and other attacks,many countries no longer accept the anarchic free development of the Internet.A network censorship has been initiated to bring the Internet into the public control and management framework.As a result,information filtering technology related to network censorship has become an increasingly important research direction.The network censorship technology mainly uses techniques like URL masking,keyword filtering,text categorization,and deep packet inspection.Currently,it is mainly applied to spam email messages and file monitoring,and there are few researches on the search engine filtering related applications.Compared with other network services,search engines have more powerful information control capability in terms of the depth and breadth of information provision.It is known that search engine has become an important network information flow access point for the public,the information filtering technology study in the search engine is thus of important research significance with great potentials in applications.The primary goal of the thesis is to study the online filtering of sensitive information.In order to realize the on-line filtering of sensitive information in search engine,this thesis studies the web page parsing technology in HTTPS proxy and the text filtering technology in the search content.In view of the fact that the terms returned by the search engine are relatively independent in the HTML source text,a message processing method based on the TCP data stream is proposed,which effectively improves the response speed of the message and reduces the dependence of the proxy server on memory,and realizes the balanced utilization of resources.In addition,short text information returned by the search engine exhibits the characteristics of short length with sparse features.To solve this problem,a feature expansion method using document topic similarity combined with short text self-semantic information is proposed.Experiment results show that,the proposed method does not only make full use of the advantages of topic similarity,but also greatly improves the text classification efficiency after feature expansion,which can effectively meet the requirements by search filtering platform to realize real-time monitoring and processing of network content.In the framework of HTTPS transparent proxy,this thesis presents a sensitive information filtering scheme based on search engine.The algorithm proposed in this thesis is applied to search response filtering platform,such as feature extension algorithm,classification algorithm,network text analysis algorithm and so on.The experimental results of the filtering platform and its performance analysis are presented to show that,the proposed scheme can meet the practical application requirements.The work of this thesis provides a feasible research idea and technically sound route for the on-line prevention and control of sensitive information,as well as the research and application of feature extension technology for short text analysis.Moreover,this study provides useful insights for the future research and related appliactions in this area.
Keywords/Search Tags:Network security, search filtering, topic model, short text classification, feature extension
PDF Full Text Request
Related items