| Rice is one of the most important food crops in the world,and rice multi-omics research is the most competitive field of multi-omics research in the world after the human genome.Researchers need to comprehensively analyze rice multi-omics data and other rice biological data from multiple rice databases to study complex rice biological processes.However,the rice databases that have been successfully constructed so far have the characteristics of multi-source heterogeneity,and the phenomenon of "data silos " is common in the rice field.It is difficult for researchers to quickly obtain and integrate useful information from massive rice data.In the rice field,an effective technology is needed for the unified conversion and integration of massive rice multi-source heterogeneous data to improve the integration and utilization of rice data.Ontology is a basic tool for integrating big data.Building a standardized top-level ontology in the field of rice will facilitate the unified labeling of multi-source heterogeneous rice data.However,there is currently a lack of a standard rice top-level ontology with a wide coverage in the rice field.An effective ontology-based technique for integrating multi-platform data is automatic knowledge extraction.Although existing technologies and systems can be used for automatic knowledge extraction,the current automatic knowledge extraction methods have problems such as poor performance or a huge amount of calculation in the absence of known semantic models,and are not appropriate for massive rice data that lack semantic annotations.To solve the above problems,because of the efficient organization and expression capabilities of knowledge graphs for massive multi-source heterogeneous data,we propose an automatic knowledge extraction approach based on knowledge graph for rice field.The specific research content is as follows:(1)A novel approach for automatic knowledge extraction based on knowledge graph for structured data sources is proposed.First,an initial candidate semantic model is generated through automatic semantic labeling and the Steiner Tree algorithm.Then,in order to improve the accuracy of the initial model,using knowledge graph as prior knowledge,our approach apply the decision tree algorithm to "move" the wrong relationship in the candidate model,leverage graph matching technology to remove some incorrect substructure in the candidate model and use frequent subgraph mining technology to search and compete the missing entities and relationships in candidate semantic model.Finally,high-quality semantic models are output.The experimental results show that compared with the two state-of-the-art automatic knowledge extraction systems Karma and PGM-SM,the average F1 score of the semantic model output by our approach is the highest when only a few known semantic models are known.In three different data sets the average F1 score is 0.917,0.894,0.883,respectively.(2)A standardized rice genomics reference connecting ontology is constructed based on 8 public rice database and domain knowledge.Based on the constructed standardized rice ontology,some structured rice genomics data sources in these rice databases are semantically annotated.A set of golden semantic model of rice structured data and a rice genomics knowledge graph are constructed.(3)Automatic semantic modeling of new rice structured data sources is carried out based on the constructed rice genomics reference connecting ontology,rice structured data golden semantic models and rice knowledge graph.The final experimental results show,compared with two state-of-the-art the automatic semantic modeling system Karma and PGM-SM,our method has the highest average F1 score in the rice data set,which is 0.776.In summary,to overcome the difficulty in integrating multi-source heterogeneous data in the rice field,we propose an automatic semantic modeling method for structured data and apply it to the automatic knowledge extraction task of structured data sources in the rice field.It lays an important foundation for data integration in the rice field. |