Font Size: a A A

Data Extraction Of Minority Subjects In Social Media Based On Knowledge Graph

Posted on:2020-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2428330575989313Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,social media plays an important role in information dissemination.Massive data is produced every day in social media platform,which contains rich information in various industries and domains.Data extraction in specific domain from massive social media data is the process which based on known domain knowledge as a priori knowledge,using a variety of data processing models to classify and filter data.It can be applied to social public opinion distribution,news information dissemination,corporate brand promotion,commercial marketing expansion,with important social and commercial value.This thesis studies the data extraction of minority subject in social media.How to solve the problem of classification of non-structured and multi-topic social media data,how to solve the sparseness,lack of identification and identification of minority subject data,and how to use known limited expert knowledge to achieve more accurate and efficient data extraction has become a major problem that needs to be addressed in this thesis.In this case,we introduce Knowledge Graph(KG)and LDA Model(Latent Dirichlet Allocation),by obtaining news data and user data from social media platform and using the expert knowledge of minority areas as prior knowledge,to achieve the goal of topic classification and content filtering,and then extract minority topics data.The main work of this thesis includes the following aspects:1.We use existing domain expert knowledge of minority domain,by reading the entity vocabulary as a node and the entity attribute as the relation between the node and the domain name,to obtain the existing entity relation except the entity attribute and get the structured triplet.We initially realize the construction of the news knowledge graph of ethnic minorities2.We use the TransE(Translating Embedding),a knowledge representation model,to vectorize the constructed triples of minority news knowledge graph.By calculating the distance between the vectors,we predict the lack of relationship between entities,and then realize the relationship prediction and domain knowledge graph completion.3.Based on the completed minority news knowledge graph and LDA model,this thesis filters and screens social media data by topic classification and entity vocabulary matching,and extracts news data related to minority topics.In this thesis,we use the FreeBase dataset,"Today's headline" news data and "Sina Weibo" public user data to conduct the experimental verification and performance testing of the proposed method.The experimental results show that the data extraction using the topic classification of the LDA model and the completed domain knowledge graph can effectively improve the accuracy and coverage of extracting minority subject data from massive social media.
Keywords/Search Tags:Social Media, Data Extraction, Knowledge Graph Completion, LDA(Latent Dirichlet Allocation), Representation Learning
PDF Full Text Request
Related items