Font Size: a A A

Ontology-based Topic Model Framework Research In The Computational Materials Science Domain

Posted on:2021-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2518306569996629Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of human society,the demand for sophisticated materials in all walks of life is increasing,and human research on materials science is also deepening,and the field of computational materials science is people's exploration in materials science by using computational methods.However,with the deepening of the research,the scale of research data related to materials science is getting larger and larger,and each research institution establishes its own material information management system.The diversity of storage structure results in the fuzziness of data structure and the complexity of data call and integration.In order to make data logical and reusable,scientists introduce the concept of ontology in philosophy to generalize the context and structure of data.The domain ontology is mainly composed of the most representative concept set and relation set in this field,among which the relation set contains the concept of axioms.At the same time,considering that computational materials science is based on the research of materials science,its domain ontology can be extended based on the domain ontology of materials science.In order to achieve this goal,an improved phrasebased topic model framework is proposed.In this framework,this paper proposes a new frequent phrase mining algorithm,which combines the word frequency threshold,part of speech and other methods to segment and modify the original text data to obtain the high frequency phrase set.Secondly,an improved phrase-based potential Dirichlet distribution topic model is included in the framework.By introducing the part of speech,the word frequency in the phrase is counted twice,and the influence of a certain word on the whole phrase topic distribution probability is aggravated.Through this improved thematic model framework,a representative phrase set is obtained,that is,a concept set in the field of computational materials science.The set of concepts in the form of this phrase is analyzed,and the set of relationships in this field is obtained.Through the authentication of domain experts,the text extends the concept and relation of the ontology of material science.In the end,this paper carries out experiments and analyses on the titles and abstracts of more than 9000 articles in the field.The comparison between the experimental results and the existing algorithms proves the practicability and practicability of this framework.
Keywords/Search Tags:Domain Ontology, Text Mining, LDA Topic Model, Perplexity
PDF Full Text Request
Related items