Font Size: a A A

Research And Practice Of Mechanical Segmentation Technology Based On Python's Nuosu Dictionaries

Posted on:2019-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:B M X AFull Text:PDF
GTID:2415330590992695Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
In the English text,spaces are used as natural delimiters between words,and English can visually see the word boundaries.However,only the sentences and paragraphs can be demarcated by obvious punctuation.The words do not have a formal delimiter.Although English also has the problem of the division of phrases,on the level of words,Yi language are harder than English.It is much more complicated.The essay is a process in which the words in the Yi language are separated by obvious symbols,that is,the process of recombining consecutive syllabic sequences into word sequences according to certain norms,that is,using distinct symbols to separate words and making words and words There are obvious boundaries between them.Language processing applications such as automatic analysis,machine translation,text comprehension,automatic summarization,text proofreading,and automatic annotation are inseparable from word segmentation.For example,if you translate a Yi language into another language,you first need to extract all the words in the Yi language,and then correspondingly extract the extracted Yi language into the language words to be translated,and then reorganize according to the corresponding grammar to get the translated results.In addition to machine translation,language processing of other infrastructure is also inseparable from the word segmentation.Therefore,before using a computer to process a Yi language,the word segmentation problem must be solved first.This essay is based on the characteristics and grammatical features of Yi language,from the basic essay character encoding,text,word segmentation standard specification,dictionary and so on.After referring to the mainstream positive,maximum reverse and maximum two-way three-word segmentation techniques,the current complete mechanical segmentation mechanism and framework are implemented in the Python environment.First of all,the text combines the linguistic features of Yi language,and incorporates the special structure "negative words" into the word segmentation dictionary.It also sets a standard of word segmentation that is more in line with the essay.Then,after analyzing the current essay text genre,the word segmentation text only uses novels and folk tales that are consistent with everyday Yi language.After analyzing and comparing the current mainstream word segmentation methods,combined with the difficulties of the word segmentation technique,the reasons for not using statistics and machine learning segmentation are explained.Finally,the processing in the Python environment is implemented,and the results of the three methods of maximum forward maximum reverse and maximum two-way matching are evaluated and compared.This word segmentation system perfectes the content of the mechanical syllabic participle mechanism.It can be used as a word segmentation tool in a large number of text segmentation,grammar,lexical and other research in the future,and provides a realistic and objective theoretical basis and data for the subsequent essay segmentation research.Although this paper implements the mechanical participle of Yi words,in this type of word segmentation method,most of the factors that determine the accuracy and speed of word segmentation.For example,the number of words in the dictionary directly determines the accuracy of the word segmentation.At the same time,the number of words in the word segment dictionary,the processor speed of the computer and the memory directly affect the speed and processing effect of the word segmentation.Therefore,in the future participle research,only the vocabulary included in the word segmentation dictionary can be expanded to solve the accuracy of word segmentation.On the issue of computer mechanical segmentation,which relies too much on computer processors and memory,only improved algorithms can reduce the dependence on computer processors and memory.The main trend of the future essays is to rely on computer statistics and deep learning to deal with essays.The use of statistics and deep learning to deal with essays is also more scientific,so only by ensuring a library of soft resources such as dictionaries and texts can we better handle essays.While constructing the basic materials,the future Yi language processing can also learn from other current natural language processing methods,and move toward speech synthesis,speech analysis,syntax analysis,semantic analysis,text analysis,and natural language understanding.
Keywords/Search Tags:Yi Words, Word segmentation standard, Python, Information processing
PDF Full Text Request
Related items