Font Size: a A A

Research On Network Information Dissemination And High Performance Computing

Posted on:2020-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q T QuFull Text:PDF
GTID:2428330590978173Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,with the popularity of intelligent terminals and the rapid development of the Internet,the Internet has become an important occasion where people share knowledge and communicate.Especially with the rapid development of network news platform,much news is forwarded and commented.Its constant updating leaves a large amount of data information on the Internet.There are not only positive and beneficial information,but also unfavorable speech hidden in these network data,which has an important impact on the security of cyberspace and the stability of the real society.Therefore,it is necessary to track the topic of network news,so as to grasp the development trend of network public opinion in real time.This paper studies the related technologies of topic detection and tracking,and improves the effect of topic tracking based on the previous research findings.The main works of this paper are as follows:1.With the passage of time and the evolution of topics,the characteristics of follow-up news reports are also changing.The topic model trained by initial news reports can not effectively capture the characteristics of follow-up news reports.This paper presents a parallel adaptive topic tracking algorithm based on Naive Bayesian classification(PATT-NB).The algorithm proposes an adaptive topic update strategy,which uses the minimum feature average confidence threshold to intercept subsequent news reports to update the training set and enrich the topic feature expression.Experiments show that the proposed algorithm can effectively alleviate topic drift and effectively deal with the big data of news topics.2.A parallel topic tracking algorithm based on N-Gram language model(PTT-Gram)is designed.The algorithm uses N-Gram language model to make full use of the word order relationship between words,improving the shortcomings of traditional unary grammar model.We implement PTT-Gram algorithm using MapReduce distributed computing model.Experiments show that PTT-Gram algorithm achieves a good parallel acceleration ratio and improves the accuracy of topic tracking effectively.3.Based on the above research,this paper designs a parallel adaptive topic tracking algorithm based on N-Gram language model(PATT-Gram).In this paper,we use naive bayesian classification algorithm combining the advantages of N-Gram language model and adaptive topic update strategy.Then,we implement it by using MapReduce computing model.The experiments are carried out and the results show that PATT-Gram algorithm not only improves the accuracy of topic tracking effectively,but also alleviates the topic drift in the process of topic tracking.Meanwhile,the algorithm has a good parallel effect.
Keywords/Search Tags:topic tracking, topic drift, MapReduce model, Na?ve Bayes classification, N-Gram language model
PDF Full Text Request
Related items