Font Size: a A A

Research On The Key Technologies Of Information Mining Oriented To Network Content Security

Posted on:2013-05-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z GongFull Text:PDF
GTID:1108330482962921Subject:Information security
Abstract/Summary:PDF Full Text Request
With the popularity of Internet and the emerging developments in new Internet applications, content-based information misuse is getting more and more serious, which challenges tremendously the network content security and threatens the state security and social stability. The technologies of information mining oriented to network content security, having become hot research topics in the domain of information security, explore how to use the computer to automatically obtain, identify and analyze the content information of specific security topic from the Internet flooded with mass ever-changing information.Network content information security needs to effectively monitor, thoroughly analyze and positively respond to content information and requires more towards the depth and breadth of technology research. This dissertation, focusing on the key technologies of information mining oriented to network content security, carries an exploratory and innovative research towards the hot and difficult issues and offers solutions, and combines related theories and technologies to design a public opinion system platform that runs around the clock. The main contribution and innovation includes:(1) We propose an hot topic auto-identifying method based on ant colony clustering algorithm. The dissertation, inspired by the highly characteristic self-organizing of swarm intelligence, focuses on applying the ant colony clustering algorithm in hot topic identification area. The shortcomings of the basic ant colony text clustering (BACTC) algorithm are obvious and as follows:First, the algorithm of later stage is hard to converge. Second, the blindness of ants moving is too strong. Thus, the dissertation puts forward the IACTC algorithm, which overcomes the shortcomings of BACTC by optimizing probability transfer function, adding memory organs and changing moving strategy. The method of topic abstract extracting after clustering is also discussed. Based on real-world web data sets, the IACTC algorithm is being tested and compared with other algorithms. The result shows that the IACTC algorithm is better in the ability of clusters discovering, the ability of converging, and algorithm accuracy.(2) The dissertation puts forward the adaptive Single-Pass algorithm based on sliding time window (ASP-SW), solving the difficulty in the topic tracking task. Traditional topic tracking technology and present adaptive topic tracking technology share the following shortcomings: excessive topics excursion, mistaking topics drift and false feedback, etc. The ASP-SW algorithm introduces the lantent senmatic model based on pLSA, constructing and updating the topic model on the senmatic layer. It also reduce the effect of negtive samples. Meanwhile, the sliding time window gets rid of the impact of the old data on topic modeling, improving the results accuracy of topic tracking. The ASP-SW algorithm adopts"Clustering Threshold" (including experience threshold and dynamic regulation of threshold) policy to guide the documents clustering and participating in topic model calculation. The ASP-SW algorithm adapts to topic-time declining factor by "dynamic regulation threshold". The dissertation, also taking the new words discovering technology as part of topic tracking, puts forward an Internet new word discovering algorithm based on scattered words matching (n-scattered). The test shows that the ASP-SW algorithm can achieve the result of satisfactory topic tracking and that the n-scattered can effectively solve the problem of discovering Internet new words and Internet hot words.(3) The dissertation raises a community discovering model (CTIM) and community discovering algorithm (CD-CTIM) in social media. The dissertation analyzes users, documents, communities and topics in social media and raises the Community-Topic Interacting Model (CTIM) based on their close relationship. The dissertation discusses the structure, the properties and transfer functions of CTIM, and puts forward the community detection algorithm based on CTIM (CD-CTIM). CD-CTIM constructs a users-topics bipartite network and adopts single mode projection with the CWP (contribution weighted projection)。The dissertation, based on the link weight on the users single-mode network space, proposes the equation of modularity of CTIM QCTIM and finds the classification of end users community by searching the optimal solution of QCTIM.The dissertation proves that CD-CTIM has a very satisfactory effect of community detection on the Internet suitable for the CTIM, while having the algorithm test through data gathered from Sina micro-blog.(4) The dissertation introduces the work of designing and researching the platform of online public opinion monitoring and analyzing (YQ Platform). YQ Platform was fulfilled by the author as a team leader during doctoral period. The dissertation describes the overall architectural design of YQ Platform and introduces in detail the technologies of the sub module, including information collecting, information extraction, and data storage.In sum, the dissertation has contributions of innovation in several key research areas of Content-based Internet information security, which offers solutions and methods to the information mining technology problems of network content security.
Keywords/Search Tags:Content Security, Web Data Mining, Hot Topic Detection, Swarm Intelligence, Topic Tracking, Social Network Analysis, Community Detection, Public Opinion Analysis
PDF Full Text Request
Related items