User Information Extraction And Analysis In Big Data Environment

Posted on:2019-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:K Q Wang

Full Text:PDF

GTID:2348330542998767

Subject:Computer technology

Abstract/Summary:

With the rapid development of information technology,knowledge of great value is distributed in massive data explicitly or implicitly,which affects people’s efficiency in acquiring knowledge.However,user information has important commercial value for vertical search,automatic question answering and personalized recommendation,but the information is mainly presented in the form of unstructured text.In the face of massive data,how to convert these data into structured data and extract the target information more accurately,and based on these information,further research has become the current hot research direction.This thesis mainly studies the user information extraction in big data environment and conducts further analysis based on the user information.First of all,this thesis proposes a scheme for extracting the entity relationship from the unstructured text accurately.The method is based on the interactive encyclopedia,building a domain knowledge base of character and uses the knowledge base to mark the corpus based on distant supervision.Then the corpus is optimized through the expansion of relationship words and semantic similarity calculation.In addition,this thesis uses the hybrid model of the bidirectional LSTM and CNN which integrated the dependency relationship to extract entity relationships and improves the accuracy of the extraction of character relationships.Then,from the perspective of multi-source features,this thesis implements the information extraction of the users in social networks.Based on Weibo user data,we start with feature engineering and extract user features from different perspectives,including numerical features,theme features and text features.In addition,the user’s network structure features are extracted according to the social network information.In this thesis,we transform the problem of user’s information extraction into a multi-classification problem.Experiments were carried out based on a cascade model which is based on semi-supervised learning to extract user’s occupational relationship,which improved the accuracy of the relation extraction in the social network environment.Finally,based on the extraction of user occupation information,a talent circle discovery framework is designed.The user similarity in the framework is calculated according to the basic user features,spatio-temporal features,semantic features,text features,network features and other dimensions,and the Logistic Regression algorithm is used to determine the weight of different types of features.Based on the above similarities,a method of calculating the comprehensive user similarity is designed and this thesis applies DBSCAN algorithm to optimize the initial point selection of K-means algorithm and improves the accuracy of talent circle discovery.

Keywords/Search Tags:

Relation extraction, Distant supervision, Semantic similarity, Multi-source features, Talent circle

Related items

1	A Chinese Entity Relation Extraction Method Based On Distant Supervision
2	Research On Entity Relation Extraction Method Based On Distant Supervision
3	Distant Supervised Entity Relation Extraction Method And Application Based On Internal And External Semantic Features And Preferential Attention Mechanism
4	Sample Denoising And Model Optimization In Distant Supervision For Relation Extraction
5	Multi-level Weight Optimization Based Distant Supervision Relation Extraction
6	Research On Key Technology Of Relation Extraction Based On Distant Supervision
7	Research On Distant Supervision Relation Extraction Technology With Pre-trained Language Models
8	Research On Relation Extraction Model Based On Distant Supervision
9	Neural Relation Extraction Based On Distant Supervision Approaches
10	Research On Relation Extraction Based On Distant Supervision Labeled Data