Font Size: a A A

Design And Implementation Of A Logistic-Regression-based Sensitive Content Detection System

Posted on:2017-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:J B XieFull Text:PDF
GTID:2428330569485082Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous popularization of information technology,the rapid development of the network,more and more people use the network to interact.Such as: online news,ecommerce,online registration and so on.Not only provides people in life and work on a more convenient way,but also improve the various industries,various departments of the work efficiency.However,some lawless elements,for the open network technology,dynamic,vulnerability,leading to Internet problems become more serious,and even affect the normal social order.People increasingly rely on Web site sources brought about by a variety of information,the information published on government websites has always been authoritative,the majority of the people trust and acceptance,once the government website was invaded and affected users,its authority will be questioned.Web pages in a variety of forms,for many Web sites on a problem: the page appears sensitive content.These sensitive content(such as violence,cult,fraud,pornography,gambling,etc.)may affect the user's browsing experience,which requires back-end managers to detect and filter sensitive content,and the post must also have The corresponding language professional basis,but there is no explicit rules of sensitive words to follow.Therefore,this paper proposes a Logistic-Regression-based text information learning filter model,which extracts the characteristics of the page text for the lexical analysis of the sample for the current existing scheme,through the operation of text analysis.Word frequency and weighting method,to identify the existence of sensitive content of the page.At the same time,the system will be in a period of time,adding new text to re-establish a new model to adapt to the new network terms,this approach as the core module,the design and implementation of sensitive content detection system.The system is developed in the Python programming language and implemented in the B / S framework.MySQL is selected as the backend database.Through the system,can effectively improve the efficiency of screening sensitive content,make full use of the background of computing resources,greatly reducing the background detection costs.Text information feature learning is a new trend to solve text content detection.It utilizes the filtering system that combines online detection and offline learning to provide a technical guarantee for netizens to create a better network environment.
Keywords/Search Tags:Lexical analysis, B/S architecture, Text detection
PDF Full Text Request
Related items