Indonsian Text Analysis And Processing For Speech Synthesis

Posted on:2020-08-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Kong

Full Text:PDF

GTID:2428330575985933

Subject:Communication and Information System

Abstract/Summary:

The future direction of development between human and artificial intelligence is to make computer think,watch,speak and feel.The speech synthesis is widely used in the area of navigation and translation.Currently,the main way of speech synthesis is still text-to-speech conversion,which the computer is doing the process of converting text information to phonic output.The research of modern speech synthesis spotlights at languages such as Chinese and English,but there is little research on Indonesian.Indonesian is from the branch of the Malay-Polynesian family which originally comes from Western Indonesian languages.It is a language that formed by Latin letters,and it has specific word boundary.This article discusses the syllabication and automatic sound segmentation based on speech synthesis system of Indonesian,and it works as a precursor to achieve speech corpus construction and text normalization in Indonesian Front-end text analysis.The main purpose of this article is:(1)The construction of corpus for Indonesian pronunciation.The research collects textual information from Indonesian websites by using available software.Then,it uses Python to select useful characters to create the initial corpus by excluding repeated characters and punctuations.The article wisely considers the length of sentence and frequent-used words to constitute the corpus;for instance,it also makes evaluation based on an objective standard.(2)The normalization of non-standard words in Indonesian texts.The article presents the research on non-standard words and their various interpretations.It draws out the means to normalize those words and the process to achieve this goal.The way which combines the regular expression and keywords is used in the process of normalization of number strings and special characters used within the text.For instance,it uses the method of regular matching to normalize abbreviations.The accuracy for the normalization is 96.2%which is generated from the experiment results.(3)The syllabication of Indonesian for speech synthesis.The article studies the formation of syllable,and it attempts to achieve the proposal of speech synthesis that is appropriate for Indonesian.Adopting the method of inverse maximum matching based on syllable list,and adding zero initials rules to achieve syllable division.The results from experiment show that the accuracy for in-set test and out-of-set test is 98.2%and 97.1%respectively.(4)The division of Indonesian phonons.Aiming at Indonesian speech synthesis,a sub-division scheme based on Indonesian characteristics is proposed and implemented.The method based on the structure of the vowel structure and the structure of the phoneme are used to determine the list of sounds,and the method of dictionary matching is used to realize the phonetic division of the Indonesian phonetic corpus text,and the prosody text is obtained.

Keywords/Search Tags:

Indonesian, Speech Synthesis, Formation of Corpus, Normalization, Syllabication, Phonemization

Related items

1	Malay Text Analysis For Speech Synthesis
2	Corpus Supported English Text To Speech Synthesis Engine
3	The Research And Realization Of Corpus Based Speech Synthesis System For Uyghur
4	A Study On The Key Technologies Of Web-Based Indonesian-Chinese Parallel Corpus Construction
5	Auto-constructing Speech Corpus With The Limited Text~2
6	Robust Speech Synthesis Based On Small Amount Of Corpus
7	Create An Emotional Speech Synthesis Corpus
8	Research On Statistical Parametric Emotional Speech Synthesis
9	Research On Tibetan Speech Synthesis Based On Deep Learning
10	Research On Automatic Construction Of Speech Corpus And Speech Minimized Labeling