Font Size: a A A

User Information Extraction And Analysis In Big Data Environment

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:K Q WangFull Text:PDF
GTID:2348330542998767Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,knowledge of great value is distributed in massive data explicitly or implicitly,which affects people's efficiency in acquiring knowledge.However,user information has important commercial value for vertical search,automatic question answering and personalized recommendation,but the information is mainly presented in the form of unstructured text.In the face of massive data,how to convert these data into structured data and extract the target information more accurately,and based on these information,further research has become the current hot research direction.This thesis mainly studies the user information extraction in big data environment and conducts further analysis based on the user information.First of all,this thesis proposes a scheme for extracting the entity relationship from the unstructured text accurately.The method is based on the interactive encyclopedia,building a domain knowledge base of character and uses the knowledge base to mark the corpus based on distant supervision.Then the corpus is optimized through the expansion of relationship words and semantic similarity calculation.In addition,this thesis uses the hybrid model of the bidirectional LSTM and CNN which integrated the dependency relationship to extract entity relationships and improves the accuracy of the extraction of character relationships.Then,from the perspective of multi-source features,this thesis implements the information extraction of the users in social networks.Based on Weibo user data,we start with feature engineering and extract user features from different perspectives,including numerical features,theme features and text features.In addition,the user's network structure features are extracted according to the social network information.In this thesis,we transform the problem of user's information extraction into a multi-classification problem.Experiments were carried out based on a cascade model which is based on semi-supervised learning to extract user's occupational relationship,which improved the accuracy of the relation extraction in the social network environment.Finally,based on the extraction of user occupation information,a talent circle discovery framework is designed.The user similarity in the framework is calculated according to the basic user features,spatio-temporal features,semantic features,text features,network features and other dimensions,and the Logistic Regression algorithm is used to determine the weight of different types of features.Based on the above similarities,a method of calculating the comprehensive user similarity is designed and this thesis applies DBSCAN algorithm to optimize the initial point selection of K-means algorithm and improves the accuracy of talent circle discovery.
Keywords/Search Tags:Relation extraction, Distant supervision, Semantic similarity, Multi-source features, Talent circle
PDF Full Text Request
Related items