Font Size: a A A

Research On Semantic Label Extraction And Recommendation Of Earth Science Data

Posted on:2024-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2530307136976159Subject:Civil Engineering and Water Conservancy (Professional Degree)
Abstract/Summary:PDF Full Text Request
Earth science data was rich in semantic information,which not only brings vast space for geoscience exploration,but also brought challenges for data sharing.However,the lack of semantic cognition and correlation made it difficult for users to find data meeting their own needed from the complex,massive,multi-source and heterogeneous earth science data.This paper took the shared data set of Big Earth Data Science Project of Chinese Academy of Sciences as the research object,realized label extraction by word segmentation and weight sorting of data text,added recommendation function to data related labels and related data by using deep learning method,and constructed knowledge graph.The main research conclusions was as follows:(1)Studied the construction of Big Earth Data label automatic extraction process.By comparing Chinese Academy of Sciences word segmentation system and jieba word segmentation,the jieba word segmentation with better effect was selected for data cleaning of Big Earth Data text information.Added the STKOS standard term library as a userdefined dictionary to improve the accuracy of the results.TextRank method was used to extract the weight of text information after word segmentation,and words with large weight values were selected as data labels.This method not only increased the number of labels from 1485 to 2215,but also improved the semantic standardization of label terms.(2)Explore the implementation of data recommendation function based on labels and summary text.At the label level,Word2 Vec method with fast computation speed and strong universality was selected to vectorize data labels and calculated cosine similarity between label vectors.The calculated similarity was sorted,and the top five most relevant labels were selected for recommendation.At the level of abstract text,the Doc2 vec model,which did not need annotation training and could consider semantic relations,was selected to vectorize the text of data abstract.Calculated cosine similarity between text vectors and recommend the data with similarity in front to implement correlation recommendation.(3)Developed and established data knowledge graph and implemented data recommendation application.The Big Earth Data elements were integrated,including the subject field,data format,data label,release time and other information of data description.The data knowledge graph was established by using neo4 j graphic knowledge base,which contained 3902 nodes and 10758 edges.The function of automatic label extraction and data recommendation was connected to the Big Earth Data sharing service system through the prototype system,which proved the feasibility and application potential of the method proposed in this study.
Keywords/Search Tags:Earth science, data sharing, semantic tags, similarity recommendation, knowledge graph, metadata
PDF Full Text Request
Related items