Font Size: a A A

The Research And Implementation Of Entity Identification Subsystem In The Data Management System Of Quality And Quantity

Posted on:2014-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:R HuoFull Text:PDF
GTID:2268330422451699Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the development of information technology, the problems causedby “big data” and “dirty data” have aroused widespread concern, the management ofquality and quantity has become a research hotspot. Entity identify problem inmassive data is one of the key issues for quality-quantity management. Existingentity identify systems are mostly aimed at the problem in general data, the practicalsystem for massive data has not been viewed. With the background of massive datainformation integration and the goal of providing solution of entity identify inmassive data, this paper introduced a massive data entity identify prototype fromrequirements analysis, system design and system implementation, etc., which cansupport the data management system of quality and quantity.First, requirement analysis of the subsystem is given. The entity identifyproblem and related techniques are introduced. Massive data processing techniquesMap-Reduce and Hadoop are also given. We state the processes and data flow ofsystem operation, and analysis the overall function of the overall system and eachmodule.Then, the detailed design of the system is given, including architecture of thesubsystem and each module. Based on system requirements, this system is dividedinto initial clustering, entity identification, entity division, probability calculations,algorithms evaluate, etc. and core algorithm, process logic and the interface of eachmodule is given.Finally, we elaborate system implementation. Implementation details of thefive functional modules are introduced, including a description of data structure,key functions and interfaces. Experiments are given to verify the efficiency, precision and recall rate of the system, and evaluate the result of entityidentification.
Keywords/Search Tags:entity identify, massive data, Hadoop, Map-Reduce
PDF Full Text Request
Related items