Font Size: a A A

Study On Hot Topic Detection Based On The Analysis Of Tibetan Public Opinion

Posted on:2011-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:T JiangFull Text:PDF
GTID:2178330332469954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the current social environment and increasingly complex web conditions, the cyberspace public opinion has a major impact on the stability of society and many people who access the Internet. Public opinion which is surrounded by the hot topics and social events such as the occurrence, development and production changes is in certain social context, mainly due to the incident that the public management of social production held the social and political attitudes. Different from the general public opinion, it occurs in vast and spreads fast, and it's difficult to detect and control the outbreaks points, so that the hot topic discovering and monitoring of public opinion in network becomes more important. At present, the monitoring system which is based on the Chinese cyberspace public opinion has been related to the research results. For example, founder technology research institute has released Zhisi public opinion decision support system. However, public opinion research in the Tibetan language is still in the preliminary stage and without enough related research. The reason is that the overall level of Tibetan language information processing is relatively lagging behind.The paper has made a briefly introduction about the model and algorithm of topics detection which is based on the reviewing of Chinese and English public opinion analysis, topic detection and status development of identification. Then compare and analyze kinds of models, and combined with characteristics of Tibetan information processing proposed hot topic detection algorithm based on analysis of Tibetan web public opinion. The most important part is to describe the system through three parts which includes Tibetan topic detection, hot topic detection algorithm and hot topic presentation. The vector space model is chosen to use in the text, and the incremental clustering is used for topic detection. In order to improve the correction and affection of hot topic detection, the author introduced the Tibetan named entity recognition algorithm which is combined grammar and statistical. Hot Topic calculations were quantified from the reported topic frequency, time span, click volume and comment volume, and give count formula of attention topic. Tibetan hot topic presented and reported in three aspects, that is, topic headline, related documents and related word group. Taking into account the characteristics of users who do not understand the Tibetan language, the system has done words translation of topic title and related words group according to the Tibetan translation dictionary.The system of hot topic detection which is in the smaller corpus algorithm reached 85% accuracy rate has basically got a practical requirement. This work will make a good foundation for the analysis of Tibetan public opinion and the study of Tibetan text classification.
Keywords/Search Tags:TDT, Tibetan public opinion, hot topic detection, topic detection, named entity recognition
PDF Full Text Request
Related items