Research Of Twitter Retrieval Based On Semantic Similarity Computing And Twitter Storm Platform

Posted on:2015-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:H F Xiao

Full Text:PDF

GTID:2298330452950784

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet industry, micro-blogging products aregaining popularity both at home and abroad. They have gradually developed into anew type of media holding increasingly high influence by providing users withcentralized and open social networking services. Given the large scale and real-timecharacteristics micro-blogging data have, how can we provide user-interestedinformation from massive and dynamically updated micro-blogging data isparticularly important now.Micro-blog retrieval and sorting method discussed in this paper is based on shorttext feature expansion and similarity calculation. Our paper is presented as followingstructures: firstly, each micro-blog(tweet here) has been expanded (make it longer) toenrich its semantic feature, which provides solid guarantee for the relatednessbetween query text and retrieved results; secondly, we try to get similarity resultsbetween micro-blogs with relatively high precision and recall using WordNetdictionary; thirdly, the similarity value computed in last step has been taken as thecriteria for sorting to simulate a real-time micro-blog retrieval environment, whichcould complete micro-blog retrieval and sorting and would provide a list of relatedmicro-blogs for each micro-blog retrieved.In order to enrich the semantic feature of micro-blogs, we take nouns inmicro-blogs as representative keywords that expressed micro-blog topics, and expandthese nouns with associated words and phrases to enlarge micro-blog. Specifically,Wikipedia are chosen as the source of semantic feature for expansion. For each nounin a micro-blog, we take it as query in Wikipedia, find the specific result entryâ€“category-in search result page, and take the words under the â€œcategoryâ€(categoriesthe specific noun are classified to) as additional semantic explaining words adding tothe original micro-blogs. Also, experiments are conducted to prove that this extensioncould improve the similarity calculation quality in a certain degree. In order to gethigher accuracy and precision, this paper takes full advantage of the special structureof online English Word database-WordNet in computing semantic-based similaritybetween micro-blogs. Specifically, we use the path-length-based method proposed in[37], which take into consideration both the node path length and the least commonsubsumer in WordNet. Also, we conduct experiments to compare our method withtraditional vector space model-based cosine similarity computing method to verify that the former could improve Precision and Recall in finding related micro-blogs tosome extent. In order to simulate the real-time micro-blog retrieval system, we studiedthe architecture and application of the open-source real-time data processing platformTwitter Storm carefully, and simulate the real-time and distributed processing in localmode. Specifically, we defined our own micro-blog retrieval topology that can beembedded into Twitter Storm platform and implemented the function of eachcomponent in the topology, including the preprocessing of original tweets dataset,information transmission between components, parallel computing of tweetssimilarity in many components, the maintenance of similarity table, sorting ofretrieved results based on similarity value, and providing related micro-blogs for eachmicro-blog in search result, etc.

Keywords/Search Tags:

Twitter, Weibo, Semantic expansion, Similarity computing, WordNet, Twitter Storm

PDF Full Text Request

Related items

1	Research And Implementation Of Cloud Computing Platform Monitoring System Based On Twitter Storm
2	Why We Follow: Exploring How Culture Shapes Users' Motivation for Following Sport Organizations on Twitter and Weibo
3	Are There Perks to Being a Twitter Wallflower? Peripheral Participants in a Twitter-Enabled Learning Space in Public Relations and Higher Educatio
4	Education all a'Twitter: Twitter's role in educational technology
5	The expansion of social media in agriculture: A user profile of Twitter's (a)agchat, (a)followfarmer and (a)trufflemedia followers
6	Based On The Media Function Of Weibo Value Research
7	Is Twitter a counter public?: Comparing individual and community forces that shaped local Twitter and newspaper coverage of the BP oil spill
8	Conceptual Semantic Similarity Calculation Based On WordNet And Its Application Research
9	Discovering Twitter Users' Off-line Community
10	The Twitter Management Of American Local Newspapers And The Revelation Of Chinese