Font Size: a A A

Topic Evolution Analysis For Multi-source Online Media

Posted on:2019-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2428330596460928Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology,multi-source online media composed of portal news websites,various news media platforms and search engines has become an important carrier for describing various topics.The evolution of the topic in largescale multi-source online media has gradually become an important research direction in information retrieval.The topic model has many advantages in the fields of potential semantic mining and topic clustering.In recent years,the topic model has been widely used in the field of topic evolution research.However,the current research on topic evolution focuses on the mining and classification of different topics in large corpora.However,the analysis of the application of a particular topic evolution process is relatively inadequate.The difficulty lies in that the text semantics of the same topic are too similar and not conducive to the text similarity calculation or co-occurrence statistics,so the traditional model is difficult to play a good effect.This paper proposes a topic evolution model based on multi-dimensional features by studying the evolution of topics in an online multi-source media environment.Based on the hierarchical Dirichlet process,the time of content,keywords,syntactic relations and named entities in the topic corpus are considered in the model and furthermore,through the incremental training of word embedding,the semantic relation of the topic context is obtained to overcome the problem of the performance degradation of the model caused by too small semantic particles.According to the evolutionary logic of the topic in reality,this paper explores the focus change of the topic in different periods,and presents the evolution of the topic graph.The main contributions of this paper:(1)Build a topic feature collection library.News is one of the most direct and objective forms of topics under the online multi-source media environment.This paper analyzes and extracts the relationship between subjects,objects,and behaviors of topics segment by syntactic analysis trees.On the basis of syntactic trees and part-of-speech tagging,the entity relationship between the time,location,participating objects,and organization of the topic segment is obtained.Eventually,it implements to extract the temporal features,syntactic features(subjects,objects,and behaviors)and naming entities(positions,participating objects,and organizations)of topic segments,and build a topic feature collection library.(2)Word vector training in topic context.For news corpora on research topics,incremental word vector training is conducted on the basis of traditional large-scale news corpora.Through the training results of word vectors,the relationship of semantic context based on topic context is constructed to reduce the semantic granularity of text in the process of topic evolution.(3)Multi-feature-based topic evolution model.Based on the topic multidimensional features and word vector relations,this paper proposes a multi-feature-based topic evolution model(MFTEM).Based on the traditional hierarchical Dirichlet process,the model extends the time dimension in the horizontal direction and increases the feature dimension of the topic entity in the vertical direction.By using the contextual semantic relation contained in the word vector,the model analyzes and mines the focus of the topic in different stages.It effectively describes the evolution of the topic under the condition of multi-source online media and establishes the evolution map of the topic.(4)Experimental verification.In order to verify the accuracy of the topic evolution analysis,this paper based on the real portal news website and various news media platforms,grabbed the news data of five hot topics to experiment,and compared the experimental results with the information of the third party manual annotation.From the analysis and comparison results,it can be concluded that the MFTEM model and feature selection method proposed in this paper can effectively analyze and describe the evolution process of the topic in reality,and can present the evolution map of the topic with the cognitive logic that matches the evolution of the topic.At the same time,the model algorithm proposed in this paper is mainly based on automatic operation.It can achieve good results without requiring too much prior knowledge and theoretical level of the topic itself or the model.
Keywords/Search Tags:multi-source online media, topic evolution, hierarchical Dirichlet process, incremental word vectors, evolutionary maps
PDF Full Text Request
Related items