Font Size: a A A

Research On Construction Method Of Entity Semantic Vector In Science And Technology Field

Posted on:2018-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:T K ZhangFull Text:PDF
GTID:2348330515962811Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Internet literature search plays a crucial rule in today's research activities,but in the face of a large number of scientific literatures,efficient and smart literature search technology is needed.For traditional literature search,it mainly makes keyword matching search by using all kinds of retrieval conditions,it is inconvenient to use,lacks of semantic understanding,the search result is not ideal.When we delve into the reason,we can see that there is a lack of knowledge representation of scientific literatures.Traditional text embedding representation fails to take the relationship between features among texts,thus causing a great information loss to semantic expression.With Deep Learning,research and widely application,phrasal structure vector representation based on word embedding is receiving increasingly attention,and it has been applied in the Natural Language Processing tasks of various fields successfully.In this paper,it makes a study by aiming at the semantic vectors of entities in the scientific field,which mainly includes the construction of dictionaries and the training of word embedding in the field.In order to make literature search in the scientific field more efficient and smart,the paper makes a study of the extraction algorithm and weight calculation by combining word embedding model,as well as the scientific text embedding model constructing based on multi-granularity semantics and the expert search model based on scientific text embedding representation.(1)Dictionary constructing method in the scientific field.Constructing dictionaries in the scientific field is the beforehand step of digging semantics.In the paper,it obtains scientific data through distributed crawlers;and on the basis of traditional scientific dictionaries,it forms words in the scientific field and generates word2 vec by integrating the third party's lexicons.(2)Keyword extraction algorithm based on words clustering.This algorithm refers to combine word embedding models with K-Means clustering algorithm,and calculate its weight by combing the distance between each word and the clustering center with TF-IDF value,then extract keywords according to its weight.(3)Scientific text embedding models with multi-granularity mixed semantics.The vectoring approach of traditional documents only takes the whole or local features.In the study,it puts forwards a kind of multi-granularity mixed semantic textembedding models,it gets the text embedding of word weight information in documents by depicting the linear and weighting features of words;it obtains text embedding used for depicting subject information among documents by using subject models,and it groups two vectors in a mosaic way.By means of increasing the spatial dimensions of vectors,it integrates local features with whole features,so as to better express scientific text embedding.(4)The expert search model based on scientific text embedding.This model refers to treat experts as the combination of multiple documents by modeling experts through papers,patents published by authors and projects participated by them,then it characterizes the different abilities of experts according to different combinations of documents,and it gets expert embedding by combining weight factors to conduct linear weighting,and finally it completes searched tasks with the similarity degree of vectors.A scientific knowledge search platform is developed according to the above research output.
Keywords/Search Tags:Word embedding, Semantic representation model, Text embedding, Expert search model
PDF Full Text Request
Related items