Font Size: a A A

Mapping Technology For Metadata Relationship And Implementation

Posted on:2021-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:S H ChengFull Text:PDF
GTID:2518306113461994Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The big data architecture of modern large-scale enterprises is becoming more and more complex,and the collection,processing,use,and abolition of big data have increased significantly,which has made it more difficult for companies to analyze data sources and impacts.Some companies have begun to try to establish a set of visual maps that reflect the blood relationship between metadata,to sort out the use of data to facilitate business query and development management.Metadata is data that describes data,such as database configuration,table cataloging information,and so on.The blood relationship of metadata describes a hierarchical structure,that is,which source data the target data comes from and which sub-data is generated,for example,data A generates data B,C,and data B,C respectively generates data D,E and F,G and so on.The blood relationship of metadata is usually extracted from the SQL text that operates on the database.To analyze the blood relationship in the text,you need to parse the related SQL syntax.At present,the solutions provided on the market are roughly divided into three categories: the first category is large enterprise service software,which has the disadvantage of requiring high purchase prices and is not easy to maintain;the second category is the use of interface functions provided by the enterprise big data framework to build the analysis process.The disadvantage is that the data needs to be read from the bottom of the database,which causes the occupation of database resources and affects the security of the data.The third type is to use open source syntax parsing tools with custom parsing rules for parsing.The disadvantage is that the database data is not accessible.As a result,the resolution granularity can only reach the table level or the pseudo-field level,and a certain degree of SQL writing standard needs to be formulated.At the same time,the open source nature of the parser may cause security risks.To sum up the above problems,this paper proposes a mapping for metadata kinship based on the basic principles of finite state automata and context-free grammar based on the Hive SQL language syntax rules used by databases in Hadoop,a widely used big data framework.Technology,does not rely on open source grammatical parsing tools,from the bottom design,developed the lexical parser and grammatical parser in Hive SQL language about blood relationship content.The above lexical parser contains three modules,the first module completes the filtering of the annotations;the second module completes the word segmentation,and according to the characteristics of the Hive SQL language,it is key to the continuity keywords and discontinuities respectively Words,escape characters,and parentheses are specially treated;the third module splits and reorganizes some complex grammar related to blood relationship,making it a single grammatical structure.Compared with ordinary lexical parsing methods,the main features of the lexical parser designed in this paper are: according to the usage habits of Hive SQL's actual production environment,the input text has been combined and changed to a certain degree,which greatly reduces the later grammatical parsing.Design difficulty.The above grammar parser parses grammar related to blood relationship.On the one hand,the top-level parsing architecture was designed to extract the kinship relations that generated the class grammar,deleted the kinship relations that deleted the class grammar,and traced the kinship relations.On the other hand,in order to assist the extraction of the kinship relations,multiple design A small and reusable parser,combined with the table structure data exported from the database,realizes the analysis of the 'target table.Target field'-> 'source table.Source field'standard blood relationship structure.Compared with ordinary grammatical parsing methods,the main features of the grammar parser designed in this paper are: not only the grammatical structure of blood relationship is parsed,but also multiple parsers are designed to extract,map and supplement the key information of blood relationship.,Can be applied to production environments with irregular writing and complex syntax scripts,and the parsing granularity can reach the field level without occupying database resources.The parser implemented by the mapping technology for metadata kinship proposed in this paper has three major characteristics: it does not occupy database resources,applies complex syntax and irregular writing environment,and has a parsing granularity of field level.More than 1,000 scripts in a bank's actual production environment were analyzed and processed,and more than 100,000 blood relationships were generated.This parser has been deployed to a bank's metadata blood relationship visualization system,which has met the expected requirements of the system.
Keywords/Search Tags:Hive SQL analysis, blood relationship, field level, complex syntax and function
PDF Full Text Request
Related items