Font Size: a A A

Design And Implementation Of Topic Analysis System For Web Data In Social Network

Posted on:2018-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:2428330596490056Subject:Software engineering
Abstract/Summary:
With promotion of the mobile Internet tidal wave,a new upsurge of social media like weibo and wechat makes information in Internet propgating rapdly.The research in propagation rules of topic helps people to perceive what happen around them and maintain social stability.It's play an important role for analysis of public opinion how to extract topic features from massive data in social network and do classification.This article builds model for massive text doucuments based on topic model.We design and implement topic analysis platform for online data stream and implement algorithm for topic analysis based on the model.We design a distributed architechture to process data online and offline.To achieve the goals above,the main work of this paper is as follows:1.This article extracts topic feature from documents based on topic model and upgrade paramterized LDA model to non-paramterized HDP model.Meanwhile,we optimize the HDP model to be capable of processing online data stream.2.This article analysis the application of word vector in machine learning and natural languge processing.We train word vector and extend word vector to paragraph vector to measure the similarity of documents.3.Online data cumulate over time,so algorithms in the system must be incremental.Besides,It's time consuming to do complex analysis for these data such as topic analysis and cluster analysis so the architechture must be distributed and data processing is parallel with multiprocess and multithread.4.This article chooses weibo as data source and prepocess data combined with characteristics of weibo.The system is extensible and can be applicable for other data sources.The basic data processes include sentence segmentation,word frequency statistics,building vocabulary and word tagging.The system provide service for extern system so it's easy to integrate.The experiment shows that precision rate of analysis for large sclare of data is high and real-time performance of system is better.
Keywords/Search Tags:Topic Analysis, Topic Model, Incremntal Clustering, Paragraph Vector, Word Vector
Related items