Font Size: a A A

Research On The Unspecified Event Detection Method In Twitter

Posted on:2018-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Y PuFull Text:PDF
GTID:2348330515951716Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rise and development of social networks bring great convenience and change to people's life.The general social network websites have more than one million of registered users.Worldwide famous social network websites include Facebook,Twitter,Weibo,etc.Social network spread a large number of useful information,and some of the social network such as twitter open their API to general users to access to big social network data quickly and conveniently.These reason resulting in a fast increment of research in data mining on twitter or other platform.Event detection which is one of the hot tasks of data mining generally divides into two types: specified event detection and unspecified event detection.The research subject of this thesis is unspecified event detection method in twitter by dealing with english tweet text.Research mainly focuses on the short text semantic similarity calculation method,the unspecified event detection method and the online unspecified event detection system implementation.The main research and innovations in this thesis have the following several aspects:(1)Putting forward a short text semantic similarity calculation method that combine knowledge-based method and corpus-based method.This method is based on the improved word semantic similarity calculation method and general short text semantic similarity calculation method.The word similarity calculation method combines two word semantic similarity by some strategies.It takes the advantages of two methods to overcome the disadvantages of single one,finds out more semantic association among words in texts,and improves accuracy of word similarity calculation.A large number of corpora are used to compare and analyze several word and text semantic similarity algorithms,the improved method has a closer result to human label than other methods in both word and short text similarity.(2)Putting forward a unspecified event detection method based on text incremental clustering.This thesis mainly improves the incremental short text clustering method in event detection.The improved method includes greedy clustering,re-cluster,merge cluster,trim cluster,and optional semantic similarity calculation.It aims at solving the problems in exist clustering methods,such as the lack of semantic similarity calculation,input order effect and bad performance of increment clustering.In addition,a method of event discrimination and event description based on clustering result feature is proposed.In the real tweet data test,the improved clustering method has a greater improvement in the clustering performance and order effect than the original method.The recall rate and accuracy of the event detection method also meet the application requirements.(3)Design and implement the Twitter-based unspecified event detection system.In this thesis,unspecified event detection system is programmed to implement,mainly including semantic similarity calculation,unspecified event detection,text preprocessing and graphical interface.In order to deal with the large amount of data of social network,some optimization methods is proposed for each module in the system.In the functional test and optimization performance comparison test using the real Tweet data,the system has normal function,and the module optimization method has greatly improved the performance,so that the system can meet the requirements of online event detection.
Keywords/Search Tags:short text, semantic similarity, incremental clustering, unspecified event detection
PDF Full Text Request
Related items