Research And Practice Of Mechanical Segmentation Technology Based On Python's Nuosu Dictionaries

Posted on:2019-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:B M X A

Full Text:PDF

GTID:2415330590992695

Subject:Chinese Ethnic Language and Literature

Abstract/Summary:

PDF Full Text Request

In the English text,spaces are used as natural delimiters between words,and English can visually see the word boundaries.However,only the sentences and paragraphs can be demarcated by obvious punctuation.The words do not have a formal delimiter.Although English also has the problem of the division of phrases,on the level of words,Yi language are harder than English.It is much more complicated.The essay is a process in which the words in the Yi language are separated by obvious symbols,that is,the process of recombining consecutive syllabic sequences into word sequences according to certain norms,that is,using distinct symbols to separate words and making words and words There are obvious boundaries between them.Language processing applications such as automatic analysis,machine translation,text comprehension,automatic summarization,text proofreading,and automatic annotation are inseparable from word segmentation.For example,if you translate a Yi language into another language,you first need to extract all the words in the Yi language,and then correspondingly extract the extracted Yi language into the language words to be translated,and then reorganize according to the corresponding grammar to get the translated results.In addition to machine translation,language processing of other infrastructure is also inseparable from the word segmentation.Therefore,before using a computer to process a Yi language,the word segmentation problem must be solved first.This essay is based on the characteristics and grammatical features of Yi language,from the basic essay character encoding,text,word segmentation standard specification,dictionary and so on.After referring to the mainstream positive,maximum reverse and maximum two-way three-word segmentation techniques,the current complete mechanical segmentation mechanism and framework are implemented in the Python environment.First of all,the text combines the linguistic features of Yi language,and incorporates the special structure "negative words" into the word segmentation dictionary.It also sets a standard of word segmentation that is more in line with the essay.Then,after analyzing the current essay text genre,the word segmentation text only uses novels and folk tales that are consistent with everyday Yi language.After analyzing and comparing the current mainstream word segmentation methods,combined with the difficulties of the word segmentation technique,the reasons for not using statistics and machine learning segmentation are explained.Finally,the processing in the Python environment is implemented,and the results of the three methods of maximum forward maximum reverse and maximum two-way matching are evaluated and compared.This word segmentation system perfectes the content of the mechanical syllabic participle mechanism.It can be used as a word segmentation tool in a large number of text segmentation,grammar,lexical and other research in the future,and provides a realistic and objective theoretical basis and data for the subsequent essay segmentation research.Although this paper implements the mechanical participle of Yi words,in this type of word segmentation method,most of the factors that determine the accuracy and speed of word segmentation.For example,the number of words in the dictionary directly determines the accuracy of the word segmentation.At the same time,the number of words in the word segment dictionary,the processor speed of the computer and the memory directly affect the speed and processing effect of the word segmentation.Therefore,in the future participle research,only the vocabulary included in the word segmentation dictionary can be expanded to solve the accuracy of word segmentation.On the issue of computer mechanical segmentation,which relies too much on computer processors and memory,only improved algorithms can reduce the dependence on computer processors and memory.The main trend of the future essays is to rely on computer statistics and deep learning to deal with essays.The use of statistics and deep learning to deal with essays is also more scientific,so only by ensuring a library of soft resources such as dictionaries and texts can we better handle essays.While constructing the basic materials,the future Yi language processing can also learn from other current natural language processing methods,and move toward speech synthesis,speech analysis,syntax analysis,semantic analysis,text analysis,and natural language understanding.

Keywords/Search Tags:

Yi Words, Word segmentation standard, Python, Information processing

PDF Full Text Request

Related items

1	A Study On Cantonese Word Segmentation Specification For Information Processing
2	Research On The Integrated Processing Technology Of Sentence Segmentation And Lexical Analysis Of Ancient Texts Based On Deep Learning
3	Experimental Study On The Fusion Of Dictionary Segmentation And Model Word Segmentation In Chinese
4	The Study Of Automatic Chinese Phoneticize Label Based On Automatic Word Segmentation
5	The Research On Tibetan Automatic Word Segmentation Technology
6	Information Processing In Simultaneous Interpretation-From The Perspective Of IP Theories
7	The Processing Of Subjective Words In Chinese Reading
8	The Word Segmentation Of Chinese Deaf Readers:Evidences From Eye Movement
9	The Advantage Of Chinese Psychological Word Processing Effect
10	Research On The Evolution Of Old Words And New Meanings For Chinese Information Processing