Font Size: a A A

Schema-rich Heterogeneous Information Network Based Text Feature Construction Research And Application

Posted on:2019-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X H CaoFull Text:PDF
GTID:2348330542998165Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of data mining,social network analysis,especially the analysis of heterogeneous information network,has attracted more and more scholars' attention.Knowledge graph,a knowledge base system with semantic properties,constructed by extracting knowledge text data from network,which is stored as RDF format to be seen as a special schema-rich heterogeneous information network.Knowledge graph is used to optimize search results at first,but with the development of technology,the research content is increasingly diversified,and more and more problems can be solved using its abundant knowledge of semantic information.Text analysis is a hot topic in the field of data mining.Common text analysis works include text classification,text clustering,entity set extension,etc.The current commonly used text analysis algorithm is to excavate the implied features from the text itself,such as word,semantics,syntactic characteristics and so on.However,in practice,the text is often affected by the situations of too short text,insufficient data,and difficult to make annotations,which results in poor features of extraction and the effect of text analysis.The knowledge graph can be used as an auxiliary or independent data source to participate in text analysis because of its rich semantic information.Based on knowledge graph,this paper mainly carries on the following two text analysis related research and application work.Firstly,this paper proposes a text feature construction method based on knowledge graph,the Meta Path-Based Text Feature Construction(short as MeTeCo).The algorithm can map appropriate vocabulary from the document with entities in knowledge graph,and apply the designed bidirectional path generation algorithm to find the proper meta path features to explore the potential knowledge relationship within text.And combining the traditional Bag-of-Words text features,it can finally construct new text characteristic with the fusion of meta path.The effectiveness of MeTeCo is verified by experiments with other text characteristics in real data sets.Secondly,this paper proposes an entity set extension method based on knowledge graph,the Concatenated Meta Path based Entity Set Expansion method(short as CoMeSE).The algorithm regards knowledge graph as an independent data source for entity set extension work,applying the improved path discovery algorithm,namely the random walk based Concatenated Meta Path Generation method(RWCP),it discovers important meta paths between entity seeds.Then the method applies a novel concept of Multi-Type-Constrained Meta Path(short as MuTyPath)to further accurately describe path characteristics of knowledge graph,and designs a new similarity measure algorithm named Multi-Type-Constrained Meta Path-based Similarity measure(short as MuTySim)to quantify the semantic features of paths.On this basis,this algorithm applies heuristic Learning and PU learning algorithm to measure the importance of path characteristics,and constructs appropriate entity set extension model.Finally,the effectiveness,efficiency and stability of CoMeSE are verified by experiments in real data set comparing with other existing entity set extension algorithms.
Keywords/Search Tags:knowledge graph, schema-rich heterogeneous information network, text feature construction, entity set extension
PDF Full Text Request
Related items