Font Size: a A A

Discovery And Extraction Of Tibetan Hotspot Event

Posted on:2016-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:W B GuoFull Text:PDF
GTID:2208330470962878Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development and popularization of the Internet, there are already 485 million Internet users made speech through forums, blogs, Twitter and other on line media all the time. They involved in the subject, bringing together into a network of public opinion, and forming online opinion. By December 2014, the scale of China’s Internet users reached 649 million. The internet has become the channel which government understands the feelings of people directly. Only find Internet information in a timely manner, quickly found a large effect, fast processing of important events, and quickly identifying and directional tracking can be faster and more complete picture the public opinion trends and guide public opinion.At present, there are many successful systems used for Chinese and English TDT (Topic Detecting and Tracking). However, in minority language networks, there are few public utility systems about hot topics detecting and tracking. The main reason is that minority language Research Foundation is weak, and network text encoding is complex and not uniform. Based on text characteristics of Tibetan network, we designed and implemented a Tibetan topic discovery and extraction system in the network environment. This system extracts the hot words, network discovery and extraction cases through 6 Tibetan networks. Finally, we extract hot event features from different angles, showing to users. Based on the above, this paper studies the following methods are discussed:(1) This paper proposed hot word extraction methods under the Tibetan network environment. By preprocessing the data through the network, segmentation, statistical frequencies, frequency-weighted, we finally extract the hot words based on entropy and variance.(2) We analysis the characteristics of network events, and summarize the hot event quantization methods by calculating report index and diffusion index. Finally, we proposed an event quantified formula.(3) This paper summarizes the features of network events and extracts event labels from different angles. By extracting the title of the text and event tag word, we show these information to the users.
Keywords/Search Tags:TDT, Hot Word Extraction, Event Qualifiation, Event Label Extraction
PDF Full Text Request
Related items