Font Size: a A A

Based On The Micro-blog Hot Topic Extraction And Utilization Of Research

Posted on:2017-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:C F ZhouFull Text:PDF
GTID:2348330482983974Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and mature, people can quickly access to vast amounts of information from the Internet. Micro-blog as a kind of public social platform, with its large user base and quick way of information dissemination, quickly rises, and becomes a very important role in people's life. With focus on the number of micro-blog more and more, also numerous users of micro-blog every day, which hides many hot topics, topic in affairs of state, natural disasters and many harmful information. Using computer high-speed computing speed, from the vast amounts of information to timely access to valuable information, is of great significance for monitoring, public opinion guide.Micro-blog hot topics on the extraction of the micro-blog content the text clustering, micro-blog content has its own characteristics, its grammatical structure is not subject to any restrictions, so there is a daunting challenge on the text clustering. Traditional text clustering is mostly based on word for processing and analyzing, micro-blog because of the limitation of its space belongs to this essay, essay in this word few in number, high sensitivity, easy to cause interference, so this article use the clustering is based on the sentence clustering. Although the describe of Chinese rich and colorful, but in the case of word count limitation, accurately express the views of one thing or after, will make a lot of people in some description on repetitive or similar, for micro-blog is based on sentence to text clustering has a better effect.This thesis deeply analyzes the micro-blog content, based on the content characteristics of micro-blog, choose by similar judgment sentences will be consolidated, micro-blog to extract the micro-blog hot topic, based on the CBOS algorithm is proposed. The algorithm takes advantage of the check and set the data structure, not only improved the extraction efficiency, extraction results become more accurate. For text similarity judgment, this article USES the edit distance between sentence similarity calculation, only add, remove and replace the traditional edit distance three kinds of operation, the analysis of the result is found, at the time of calculation if join exchange operation, will be more close to reality, the result of calculation accuracy is higher, do not fall in this article, through the largest subsequence to calculate exchange visits.This thesis is to exchange operation increased, and the basis of edit distance to calculation of sentence similarity between then use and check set data structure algorithm based on sentence clustering on micro-blog, finally get the results. Analysis and comparison the results of experiment show that this algorithm C BOS more suitable for micro-blog topic extraction, text clustering algorithm is better than traditional, based on this algorithm is designed and implemented by the system can achieve the desired requirements.
Keywords/Search Tags:micro-blog, text clustering, CBOS, edit distance, similarity calculation
PDF Full Text Request
Related items