Font Size: a A A

Feature Analysis And Ranking Algorithms Of Microblog Domain Experts

Posted on:2016-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2348330476955770Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With microblog gradually into people's daily life, more and more people like to share information, understand the hot topic, even in accordance with their own interest to find a user to interact on the microblog platform, so more and more people pay attention on the microblog users analysis. Currently microblog search engines have the function to find celebrity in accordance with the subject area(keyword), but this function just shows the users with the microblog platform service certification. However, many users with the microblog platform service certification do not publish the relevant posts. In this thesis, we analyze features of the domain users and design rank algorithms to improve retrieval quality.Our study can effectively select the optimal combination of features to identify experts of the subject area. Our experimental data is 1200 microblog users of IT subject area which came from Sina Weibo in China, domain users' data include non-text features, and 280,733 Weibo content. First of all, we process microblog data, and analyze user influence based on non-text features of the domain user. Then we put forward three kinds of sorting methods to analyze non-text features of the domain users, so we can determine the domain experts. Experimental results show that the method based on non-text features is feasible, and the precision is higher than the traditional identification method.Next, we consider microblog text feature of the domain users to analyze the user influence, and we introduce two similarity methods — Jaccard similarity method and Cosine similarity method to calculate microblog content similarity between user post content and domain keywords. Then we compare the precision of the two similarity methods with the precision of methods based non-text attributes. The comparison results show that the former is higher than the latter.Finally, we propose to fuse of non-text features and text feature, and we design three kinds of sorting algorithms —integration rank, greedy algorithm, SVM Rank method. We use these three ranking methods to analyze fusion features of the domain users, and identify the domain experts. The experimental results show that the highest precision is achieved by the feature combination method. This feature combination method aggregates the non-text features and text feature and selects the optimal feature combination using greedy algorithm.
Keywords/Search Tags:domain experts, feature analysis, ranking algorithm, precision, optimal feature combination
PDF Full Text Request
Related items