Font Size: a A A

Research On Mining Geographic Location Attributes Of Characters Based On Social Text Data

Posted on:2022-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ChenFull Text:PDF
GTID:2518306524492434Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the popularity of mobile devices,users' reliance on social networks is increasing day by day.Weibo is one of the largest social platforms in China,and a large number of researches based on Weibo have emerged at the historic moment,such as research on disaster detection and management based on Weibo topics,research on the movement trend of Weibo user groups,and research on the application of microblogs in the public security system.The social effects of research based on Weibo can improve the speed and efficiency of society's response to emergencies.However,the effective geographic location information of most users is difficult to obtain,making the above-mentioned research results impossible to promote and apply.Therefore,in order to make the relevant research results get better promotion and application,this thesis will mine the user's geographic location attributes from the text data on the Weibo platform,and conduct related research such as data collection,preprocessing and speculation.The specific work is as follows:Firstly,in response to the lack of corpus with geographic location tags on the Weibo platform,designed a crawler strategy,and the following two aspects of data information are obtained by studying the text data and user data characteristics of Weibo:(1)Weibo posts with geographic location tags,build a corpus of Weibo posts with the data,and use it as the basis for constructing a Weibo text location prediction model.(2)User related information(including user profile information,user history postings,user social relationship networks,other users' profiles that users follow information and historical postings),using the data to establish a target user data set,which is used as the basis for inferring the user's main activity location.Secondly,the social text data is too short,colloquial,and with a lot of noise.The geographical location-related features are highly sparse and the feature entries are insufficient,which leads to low accuracy of location prediction.In response to the above problems,this paper designs a new text preprocessing method.The specific research is as follows:(1)On the basis of conventional text cleaning,a text cleaning method based on UF-TF-ICF-W is used to further clean the corpus,improve the density of geographical location-related information in the corpus.(2)Establish and introduce urban interest points and dialect dictionaries to improve the accuracy of word segmentation.(3)Design a targeted word segmentation modification rule that can improve the weights of some feature entries that are strongly correlated with geographic location can strengthen the location characteristics of these entries.(4)A feature selection improvement method based on Weibo text data(CHI-TF-IDF)is proposed to reduce feature dimensions increase the speed of model calculations.Finally,this thesis constructs a microblog text location prediction model based on the Naive Bayes algorithm,and based on this model,proposes a method for user main activity location prediction based on a weighted voting mechanism.In the final experiment,the fusion accuracy rate reached 78% at the municipal level and 82% at the provincial level.
Keywords/Search Tags:Weibo, short-text, text cleaning, feature selection, geolocation prediction
PDF Full Text Request
Related items