Font Size: a A A

Design And Implementation Of Deep Learning-based Open Speech System For Innovative Enterprises

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330614972387Subject:Software engineering
Abstract/Summary:PDF Full Text Request
After decades of technological development,with the successful breakthrough of deep learning technology,automatic speech recognition technology is maturing,recognition accuracy has been greatly improved,and it has the ability to be commercialized on a large scale.Artificial intelligence automatic speech recognition technology has been widely used in various industries such as finance,medical treatment,court trial,vehicle,and tourism,and plays an extremely important role.But automatic speech recognition technology based on deep learning still encountered several problems in the process of industrialization.Among them,the "high threshold" of cost and technology has become an important factor restricting the rapid development of speech recognition technology.Based on the above problems,in order to allow more start-up companies to obtain free voice technology,thereby reducing the friction cost of the voice industry during the landing process,an open system that supports enterprise-level private engines is urgently needed.The author follows the thinking method of software engineering,and constructs an open speech recognition system with modules such as language recognition experience,downloading of voice service resources,language model customization,and model performance testing,combined with the specific business needs of the company where the internship is located,the Django web development framework and lightweight Nginx and u WSGI services are applied.After a lot of research related to speech recognition,the author selected the TDNN-f algorithm and Chain training criteria that can be efficiently modeled for acoustic model training,and used the n-gram model and the method of adding hot words for language model training and optimization.WER serves as an indicator of speech recognition rate to ensure the recognition effect and working efficiency of the automatic speech recognition engine.The author participated in and completed the design of the system architecture,the implementation of online speech recognition module,resource download,language model adaptive training module and model performance test module,assisted in the construction of speech recognition engine.Finally,through testing,it is verified that the functions of each module of the system meet the requirements.Finally,through testing and verification,it is verified that the functions of each module of the system meet the requirements.At present,the system has been online and operates steadily,and has been adopted by many manufacturers to provide convenient voice services for many startups.
Keywords/Search Tags:automatic speech recognition, open source system, language model, Django, TDNN-f
PDF Full Text Request
Related items