Font Size: a A A

Malay Text Analysis For Speech Synthesis

Posted on:2019-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:M F ShiFull Text:PDF
GTID:2428330548473445Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of artificial intelligence,speech synthesis technology has also matured.Speech synthesis has been gradually applied in all aspects of our lives,such as reading software,voice broadcast and so on.At the present stage,the main implementation of speech synthesis is text-to-speech conversion,that is,the convert text to speech through computers.However,researches on speech synthesis mainly focus on more general languages,and there is relatively little research on Malay.This article aims to develop a Malay speech synthesis system,study and realize corpus construction,text Normalization and Syllable in Malay front-end text analysis.The main work of the paper is as follows:(1)Construction of Malay corpora.Use the existing software to download Malay text from Malay website and remove illegal characters and duplicate data in the text,Use the result of the collation as Malay text corpora.Base on this text corpora,the principle of combining high frequency words and sentence lengths is used to select pronunciation corpora for recording.Finally,reasonability and representation of pronunciation corpora are verified by judging criteria.(2)Normalization of numeric characters in Malay texts.The paper study the types of special characters and ambiguities that often appear in Malay texts,propose Malay language Normalization scheme and algorithm flow.Use the combination of regular expressions and keywords,We normalize numbers in sentences and special characters that use in conjunction number.The experimental results show that the correct rate of Normalization is 95.13% in this experiment.(3)The division of Malay syllable.Learn and research the existing Malay Syllable schemes,draw on these schemes to propose our own syllable solutions,and design the algorithm flow to implement the scheme.Use a combination of rules and syllable lists to achieve Syllable division.The experimental results show,that the accuracy of intra-set test for Malay Syllable reached 100.00%,the accuracy of out-of-set test is96.40%.In summary,the proposed front-end text analysis method for Malay corpus construction,normalization,and syllable achieved the expected effect and can basically achieve the requirement of Malay speech synthesis systems.
Keywords/Search Tags:Malay, speech synthesis, corpus construction, Normalization, Syllable
PDF Full Text Request
Related items