Font Size: a A A

Research On Database Schema Mapping And Approximate Multi String Matching For Crowdsourcing Data

Posted on:2019-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhaoFull Text:PDF
GTID:2428330566982897Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the arise of crowdsourcing mode,it has become a new economic mode,different types of crowdsourcing platforms are springing up all over the world.The lightweight crowdsourcing platform based on the NoSQL database is the most widely used crowdsourcing model.Its high efficiency read-write performance and support for distributed storage can cope with storage problems under large amounts of data.However,its weakly structured storage mode also has some disadvantages such as poor universality and weak support for data operation.Therefore,how to deal with crowdsourcing data with large amounts of data efficiently through transaction operation has become the main focus of current research.Based on NoSQL database's insensitivity to transactional operations,there has been a large amount of work aimed at optimizing this problem by studying the transformation between data structures.However,there is less systematic research on schema mapping from NoSQL database to relational database.This paper takes Recital crowdsourcing platform as the research object,mainly researching the scheme of crowdsource data schema mapping to MySQL and researching the identification and matching algorithm of crowdsource data of multi-string content.The specific research contents and research results are as follows:(1)First,this paper analyzes the design principle of crowdsourcing system,focuses on the comparison between MongoDB database and relational database used by lightweight crowdsourcing platform and makes theoretical preparation for schema mapping scheme.(2)This paper studies the algorithm of string comparison,introduces the basic principles of Edit Distance and Needleman-Wunsch algorithm and provides theoretical support and comparative study for multi-string fuzzy matching algorithm.(3)For the migration of data from MygoDB to MongoDB,this paper proposes the corresponding schema mapping scheme through the two modules of data migration and data mapping.According to the differences between the data type and the database structure in different environments,the migration plan for MySQL is completed.And use the form of pseudo-code to do a detailed analysis of the mapping process.(4)For the problem of multi-string extraction,a multi-string fuzzy matching algorithm based on edit distance was proposed.Using the idea of convolution,the degree of similarity was calculated bit by bit using the edit distance,and the matching results were screened according to the corresponding output criteria.In the output of the threshold parameter selection,the higher the threshold is,the higher the accuracy of the matching result is,but the matching time is also increased.The overall accuracy of the matching is taken into consideration to select the optimal threshold.Experiments show that the proposed algorithm based on edit distance is superior to the Needleman-Wunsch algorithm in accuracy and takes much less time in the process of approximate multiple string matching and single string extraction.
Keywords/Search Tags:crowdsourcing, MongoDB, MySQL, schema mapping, Edit distance, Multi string matching
PDF Full Text Request
Related items