Font Size: a A A

Metadata Management And Application In Massive Network Data Environment

Posted on:2018-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChangFull Text:PDF
GTID:2348330518496040Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Metadata is the data that describes the data. The term, originated in 1969, was proposed by E.Myers Jack. It was published in the "directory interchange format" document for the first time, which was written by United States National Aeronautics and Space Administration in 1987.Metadata has been developed for nearly thirty years. In recent years, with the development of large data processing technology and data analysis technology, the accuracy of metadata is related to analysis results, which has received people's attention. In massive network data environment,metadata usage scenarios have grown explosively. At the same time, there are many problems and challenges, such as the category of metadata increases, the volume of metadata increases, the coupling of different types of metadata enhances, the timeliness and consistency of metadata can not be guaranteed.This thesis designs and implements the system architecture, software architecture, database, REST API and so on under the massive network data environment by using JavaEE, JPA, JDBC and other technologies. The metadata management system designed and implemented in this paper provides users with a unified and consistent operating interface and platform and a complete version management strategy for meta-data platform managers and disaster recovery mechanism. In addition, the metadata management system designed and implemented in this paper supports hierarchical rights management and supports log query, which makes metadata management more efficient and convenient.IP and domain names are important metadata. In the existing IP repository, there is no accurate storage of IP and domain names associated with both information. In this paper, the data coverage, data coincidence rate and data credibility of the existing IP resource library are compared and analyzed, and the IP resource library based on the metadata management system is designed and implemented. Then, this paper studies the correspondence between IP and domain names. By comparing and analyzing the algorithms of edit distance, LCS algorithm, GST algorithm,Hamming distance and adjacency, the calculation method of domain name similarity is determined. The data generation algorithm and data insertion algorithm of IP and domain name data in the storage process are described.Finally, this thesis analyzes the data of IP and domain name, which is in the metadata management system. We analyze the integrated traffic, the number of access users, the average user traffic, the average visit flow and the geographical characteristics. Several common user interest patterns are revealed and analyzed by using clustering algorithms.
Keywords/Search Tags:massive network data, metadata, metadata management, flow analysis
PDF Full Text Request
Related items