Font Size: a A A

Research On Hot Topic Recognition And Trend Forecasting Based On Forum

Posted on:2016-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:H D ZhangFull Text:PDF
GTID:2208330461485689Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the forum has become one of the current rise of the Internet media, It has become a gathering place for a large number of internet users because of the forum sharing, real-time, interactive features, Therefore, the forum has become one of the main channels to get information. users can post a topic for discussion, propose a question to solve, put forward their views on social hot topic, it have gradually become the majority of users to share information, browse information and a platform of express their views. However, the huge amount of information that forum generated every day and the fast speed of it spread, so it causes generated more topics. In this case, how to browser the hot topic quickly to understand the current social focus is very meaningful.Firstly, starting from the bbs data collection, this part main research content is as follows : difficult to identify for the paging links, forum links repeatability, the design of queue and database, multi-threading. the extracted text data stored in a database, as the experimental data source in this paper.Secondly, it does some research on the hot topics based on the collection of the information. It brought up the multi-vector policy in the text representation aimed at the particularity of the forum data, the traditional vsm to calculate the similarity is composed of four sub-vector(time, place, person, event), and then integrate the final draw similarity value. At last, by comparing with the traditional vsm, prove the accuracy of the algorithm.In topic detection, it superimposed or update daily in the order of time, aimed at the number of the post. It brought up second cluster to detect the topic. At first do on a daily basis once the local cluster data sets get temporary topic, and then set to do with the old topic a cluster, to obtain a final set of topics. Considering the number of Single-Pass algorithm cannot determine in advance the subject under dynamic data source, but to automatically generate clusters based on the advantages of the class similarity threshold, clustering is used in Single-Pass algorithm. Finally, the experimental data prove the reasonableness of the proposed algorithm.Finally, on the topic of trend forecasting, by using time series ARIMA model fitting topic for some indicators were predictive analysis. First introduced the ARIMA model, modeling steps are given the advantage of this model for the estimated model identification and parameter checking the correctness of the model, using the model to predict. Statistical characteristics of the time series ADF test its stability, determined in accordance with the respective order autocorrelation and partial autocorrelation function diagram. Finally, with the residual sequence diagram verify the accuracy of fitting. By comparing the model predicted value and the actual value, then it has given the tendency of hot topics in a specific period of time.
Keywords/Search Tags:forum data collected, multi-vector policy, secondary cluster, Single-Pass, ARIMA model
PDF Full Text Request
Related items