With the development of social networks and mobile Internet,people don’t need to know the latest social events by newspapers,TV news and other channels,but can easily get them from social networks and mobile terminal applications.Journalists are no longer privileged to report social events.Ordinary Internet users can also use their mobile phones at any time to record all the social events happening around them and upload them to social media sites to share.Uploaded social events usually contain both text and visual information(images,videos),and are related to certain topics.Therefore,automatically mining and classifying hot topics of social events from a large number of social media data is very helpful for users to better browse,search and follow social events continuously.In the field of social event analysis,there have been many mature works,most of which are based on probabilistic topic models.These works can not only jointly model multimodal information of social events,but also effectively use supervised information to obtain more discriminative event representation.However,the existing models are still unsatisfactory in terms of topic interpretation.This thesis mainly focuses on the following two aspects:(1)In the existing event classification methods,researchers have ignored the rich internal semantics of supervised corpus.In the thesis,a multi-modal supervised topic model with internal semantics(Sem-MMSTM)is proposed for social event classification.This model utilizes two internal semantics,namely part of speech semantics and category semantics,to enhance event representation and topic mining.Finally,we evaluate the proposed model on a real large-scale multi-modal dataset.Compared with the existing models,our proposed Sem-MMSTM has significant performance improvements in both on the metrics of classification accuracy(ACC)and topic interpretability(PMI)due to the introduction of effective semantic information.(2)Social event documents contain rich knowledge entities and knowledge relations,which are encoded in each knowledge graph(such as Word Net,Freebase,etc.)in the form of vectors.In the thesis,we propose a multi-modal supervised topic model based on semantic and knowledge extension(Pos-KGE-MMSTM).The internal semantic is introduced by part of speech tagging technology,and the external knowledge is introduced by extending a knowledge modality.Multi-modal data of social events and the extended knowledge modality share a topic space within the scope of the document.Our model avoids the topic model’s bias to high-frequency words blindly by using part of speech priori in text modality,automatically obtains the relationship between entities in the knowledge graph in extended knowledge modality,and guide the topic modeling of the model.By making full use of the internal semantics and external knowledge of the corpus,the proposed model has good semantic consistency and can better determine the document representation in the topic space.The experimental results demonstrate the effectiveness of the proposed method. |