Font Size: a A A

Big Text Data Analysis And Suggestion Service Based On Microblog

Posted on:2015-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2298330452450801Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Microblogging is a user relationship platform for information sharing, dissemination, and communication. User can, through WEB> WAP and various client components for individual communities, update and share information by about140words immediately. In addition, microblogging has a stronger information dissemination capacity and members clustering than traditional social networks. This unique advantage makes it quickly become one of the main social media. Meanwhile, as a very important source of information and communication channels, it plays a key role in a growing number of social events. To analysis and research for microblog data has become one of the current research hot topics. We use Sina microblog data as our research object, make the text processing for micro-topic data, and analyze its media characteristics and the optimization of classic search query suggestion service. At last we also discuss the data processing efficiency under Hadoop big data processing framework.In this paper, our work mainly includes the following four aspects:1) for the extraction of sina micro topic data, we adopt the method of Html page parser to access to data, which have solved the problem of sina API data interface that is not fully open to public, and data cannot be obtained in a complete way.2) we propose four measures to evlaute user participation, user activity, topic popularity and topic activity, which helps us understand media features of sina microblog topic. We focus on the topic characteristics and give out the topic evolution graph. Moreover, we study the microblog semantic extraction based on LDA topic model. The experimental results show that microblog text has strong topicality and timeliness. But due to the microblog text is too short, directly using LDA for latent sematic extraction is not ideal.3) for the big data text processing, we utilize MapReduce programming model under the framework of Hadoop to deal with a large amount of microblog text. We realize the inverted index of query words in the document under the platform of Hadoop. We not only can index some microblog posts by given query text, but also discuss the processing time of data under the different size and different nodes.4) we propose a web search query suggestion approach based on microblog topic, which efficiently helps user quickly impress their information requirements and more accurately access to their needed information. Nowadays with the rising of the newly social network media, more and more real-time and hot topics have emerged in a short time. For the search system, it is difficult to effectively give the query suggestion of Web fresh aspects, especially when there are queries which have little or no information in the query log. Our proposed approach takes into account the strong topicality and rapidity of microblog in the newly social network media by taking full advantage of micro-topics comments to mine potential recommendations,. Experimental results show the effectiveness of our proposed approach in query suggestion of Web fresh aspects.
Keywords/Search Tags:Microblog, Topic Evolution, Hadoop, Big Data, Query Suggestion
PDF Full Text Request
Related items