Font Size: a A A

Study On People-search In Microblogs

Posted on:2016-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiangFull Text:PDF
GTID:1108330503456155Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Under the rapid development of social networks, more and more common users and experts have been using microblogs. They are both customers and producers of information. Therefore, a large mount of data are created by all kinds of user. Expert search is an old research problem which mainly relies on limited data, such as enterprise data. However, searching people in social networks faces some new challenges.First, the mount of people in social networks is very large; second, the number of topic in social networks is huge; third, data of social networks is sparse, some of them are fake and instability, which are di?erent from enterprise data. The early researches on enterprise data cannot handle these problems. Inspired by the works of Twitters and Cngos, this paper pays attention to the problem of people-search on social networks.Specifically, we mainly deal with the data from weibo.com to launch our study.In terms of social network data acquisition, subject to the problem of blocked social network, we proposed crowdsourcing and anthropomorphic solutions on crawlers.We actually developed and maintained of the Chinese Crawler Union, and more than30,000 registered users have joined us. We crawled the 250 million user’s profiles, 80 million people of following relations, more than 20 billion microblogging which meet the data needs of the vast majority of researchers in social networks.In terms of indexing for large-scale social data, We propose a method for bottomup construction of the index of static data. we can index 10 billion records with the cost of only one bit for each record. moreover, it reach 1.2 million queries per second on random concurrent query. All performance are significantly better than the state-of-theart open-source No SQL database implementation,such as Level DB and Tokyo Cabinet.Finally, we open THUIR-DB and it has been used by many high-tech companies.Faced with the problem of sparse tags of users, we make a di?erent solving: tag prediction for users without any tags; tag expansion for users with some tags. Tag prediction adopts a two-staged method, which first predicts a close friend group of users by logistic regression, and then predicts the tags by the tags of those close friends.The results are better than the baseline by 80 % on the measurement of P @ 1, P @ 5, P @10 and R @ 20. Tag expansion first constructs a pseudo-label data label, and using supervised learning method to tell the probabilities of expanded tags. The result is better than all known methods, such as random walk methods,where the value of P @ 1, P @5, P @ 10 and R @ 10 win best known methods for more than 10 %.Ranking is the last problem, we propose an improved Page Rank method for ordering social networking users. We use the measurement of n DCG to tell the performance of ranking on di?erent categories of algorithms and di?erent types of queries.The results show that authoritative algorithms dominate the keywords of academic search,activity algorithms favor keywords of occupational search, votes based algorithms prefer the keywords of corporate search.We also explored the problem of hidden experts finding. We construct a pseudolabel data to train the supervised learning models. Experiments show that the value of n DCG has been significantly improved after adding hidden experts we found.
Keywords/Search Tags:Social Networks, Machine Learning, People Search
PDF Full Text Request
Related items