Font Size: a A A

The Design And Implementation Of Baidu Crowdtesting Platform's Labeling System And Its Extended Application In Data Collection

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:M C XuFull Text:PDF
GTID:2348330542999772Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the domestic and international Internet community has ushered in a big upsurge of AI development.AI will use many deep learning algorithms.Most of current deep learning is still focused on supervised learning,so for many artificial intelligence areas such as face recognition,user behavior analysis,smart driving,etc.all need to provide massive training data set through data labeling and collection work.Baidu Company has followed the AI trend and has already formed a relatively complete artificial intelligence layout.The labeling and collection of data is an indispensable basic link in this large layout.In order to solve the two difficulties in guaranteeing data quality and reducing costs in the field of data labeling and collection,this thesis constructs a data labeling subsystem of Baidu Crowdtesting Platform,including three modules of labeling module,admin module,and extended collection module,and builds the platform.The process adopts various methods such as injecting sample question mechanism,auditing mechanism,and fitting mechanism to ensure the quality of data.At the same time,the system's Baidu Crowdtesting Platform has practiced the idea of crowdsourcing and has used public capabilities and resources to reduce data production costs.This project provides Baidu with a low-cost and high-quality training data solution,which saves the development cost of the enterprise and guarantees the enterprise's product quality.In the process of project construction,the front-end adopts a mature and stable Angular2 framework.The server side uses PHP's Yii framework.The Yii framework is a pure OOP framework of the MVC structure.The solution allows developers to focus on the development of business logic.At the same time,in order to make up for the lack of support for asynchronous development of the PHP language,the project embedded uses the Crontab function under Linux.In addition,the project also uses Redis to cache some business information,reducing the access pressure of the relational database and improving system performance.At present,the project has been running smoothly,providing a large amount of accurate data for the company's production lines including Apollo,DureOS and other Al product lines.
Keywords/Search Tags:AI, Training Data Set, Labeling, Collection, Yii Framework
PDF Full Text Request
Related items