| Segmentation technique is an important round of front-end text analysis in speech synthesis system, and the segmentation of corpus text could make the synthesis speech have a better naturalness, which decides whether the synthesised speech obey human’s pronouncing rules or sounds fluently. There is a natural delimiter between most of the Indo-Euro languages, which enables people to distinguish the word boundary easily. While in the Dai language text, there does not exist such a boundary, so this paper talks about is how to ensure the word boundary in a large paragraph of Dai language. There are many ways to segmentation at present, while generally speaking, there are only two main methods:machine segmentation and statistics-based segmentation. The former has the relatively lower accuracy, and the segmentation speed relies on the scale of dictionary as well, which would make the result unsatisfactory. As a result, achieving the segmentation of the Dai language by adopting statistics methods became a worth-studying question.This paper adopted conditional random field model to study Dai language segmentation, works finished are as follows:1, Stating the role that segmentation played in speech synthesis and introduced the two segmentation methods mentioned above by referring to Chinese and English segmentation methods.2, By contrasting three common statistics models, HMM, MEMM and CRF, this paper states the advantages of CRF when adopted in Dai language marking and segmentation.3, Set initials and finals as feature items and summarize Dai characters, constructing Dai dictionary and write C++program to make preliminary marks on feature items and location information.4, In CRF platform, the practice and predictive segmentation were achieved, combining dynamic link library and then transplanting segmentation algorithm in Visual Studio2010platform, and giving the result.The result of the experiment shows that, adopting conditional random field model in Dai text segmentation would gain a higher accuracy, besides, in the respect of precision, the accuracy P was91.05%, recalling93.2and FBI92.34%, which met the basic requirements of Dai language segmentation and enable the speech synthesized a better naturalness. |