Font Size: a A A

Design And Implementation Of Topic Detection System For Specific Domain

Posted on:2019-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q W JiangFull Text:PDF
GTID:2348330542498624Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet,everyone can make information anytime and anywhere.As the most widely used social platform in China,weibo produces huge amounts of information every day.The data on social networks contain a lot of information that is closely related to current social hot events.With the advent of the big data era,more and more humanities and social sciences experts use computer technology to assist research.By digging hot topic or events from the data of social networks,we can track people's reactions to policies or events,and this have certain research value.Due to the noise of short texts and sparsity of features,this paper improves the part of feature extraction and text vectorization,so that we can extract the full semantic feature as much as possible in feature extraction.According to the survey,this paper chooses weibo as the research object.By analyzing the characteristics of weibo user and weibo text,this paper proposes two methods to improve the accuracy of text clustering,one is using repeat string to improve the accuracy of text segmentation,second is using sentence vector made by doc2vec technology to do feature expansion.This paper investigates and analyzes the key technologies used in this process.On the basis of the micro-blog topic detection model based on doc2vec,this paper designs and implements a micro-blog topic detection system for micro-blog data set in the field of tobacco control.System use network crawler collect 353797 weibo text from March 2017 to June,which is used as the research object.Then,system do text cleaning and text vectorization,use k-means++ to do text clustering,and finally extract keywords to represent the topic.Additionally,the system can analyze a single text,and generate weekly and monthly reports by inputting dates.Users can intuitively understand the topic of users on the micro-blog platform recently,and the trend of data.
Keywords/Search Tags:topic detection, word embedding, repeat string, text clustering, microblog
PDF Full Text Request
Related items