| Citrus is the most popular fruit category in the world and has significant economic and nutritional worth.Research on citrus covers the whole industry chain,and the research data have formed a huge knowledge base.The majority of citrus-related study materials are available as unstructured texts or books.Using manual annotation extraction or conventional text mining techniques in this situation to acquire the crucial information about citrus varieties has some drawbacks.Our research work tries to construct an information extraction model,automatically extracting key information from unstructured texts introducing citrus varieties,and building a knowledge graph of citrus varieties,addressing practical needs such as germplasm resource management and variety planting recommendation to help the development of citrus industry.In the beginning,our research created the citrus ontology and the dataset of citrus variety information.The dataset consists of 1008 data with a total of 269,595 characters,and the average character length of the text is 267 characters.The citrus ontology includes17 entity types and 18 relationship types among them.Additionally,we developed a citrus text naming entity recognition model based on Bi LSTM+CRF,which can automatically recognize entity information in the text,for the circumstance that citrus variety names have ambiguity and differing formats.The experimental findings reveal that the constructed model has accuracy,recall,and F1 values of 89.39%,85.39% and 87.25%,respectively.These values are higher than those of other commonly used models and have the best entity recognition effect.Furthermore,we discovered that there was a problem with overlapping entity relationships in the text after examining the characteristics of the citrus variety information dataset.To address this problem,we improved the likelihood that the object entities in the triplet were recognized in our study by introducing relationship trigger word information,and we creatively proposed an improved Cas Rel-based entity relationship extraction model for citrus text.Meanwhile,we designed comparison experiments and ablation experiments in different dimensions for evaluating the effectiveness and performance of the improved model.With a F1 value of 79.32%,which is 3.07 percentage points higher than the baseline model,the experimental results demonstrate that the improved model has a more stable training process than the baseline model.The improved model also improves the ability to extract overlapping triplet,and the results are superior to the baseline model in all overlapping cases.The F1 value of the improved model is 11.08 percentage points greater than that of TPLinker and 9.04 percentage points lower than that of UIE when compared to those mainstream models,but the training time is only 8.88% of the TPLinker and 3.31% of the UIE.At last,using the graph database Neo4 j to build a knowledge graph of citrus varieties,our work organized the links between the traits of fruit trees,fruits,and pest and disease control of each variety in an effort to provide high-quality reference information for managing citrus production. |