Font Size: a A A

Tibetan Sentence Boundary Recognition Method Based On Sentence Part Of Speech

Posted on:2017-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:M L Z WanFull Text:PDF
GTID:2278330488490211Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network technology, natural language processing progress has developed fast in Tibetan language,and traditional Tibetan language research is inseparable from Tibetan natural language processing. Therefore, Tibetan natural processing has become an important part of Tibetan language study. Tibetan natural language processing is an interdisciplinary subject of natural science and social science, which combines Tibetan linguistics, computer science, mathematics, logic and psychology. It focuses on the study of interpersonal communication and communication between man and computer. Comparatively speaking, Tibetan natural language processing research started relatively late, involving only Tibetan characters, with a focus on Tibetan character encoding, font building and Tibetan character processing for a long period of time. Later on, the national and international standards were set. Tibetan natural processing research transited from characters to words. Tibetan word segmentation system and tagging system have developed. We have now obtained a series of innovative results, such as statistics on Tibetan word frequency.The above-mentioned research results laid a solid foundation for the Tibetan language sentence research. Currently, several scientific research institutions and scholars have begun studying on the characteristics, attributes and elements of Tibetan sentences. The premise of these studies is to be able to determine the boundary of Tibetan language sentences. Only when Tibetan language sentence boundary identification technology has developed into a more mature state, the study of Tibetan sentence characteristics, elements analysis and syntactic analysis can be accurately realized.The study on Tibetan language sentence boundary identification is of great theoretical value on lexical analysis, syntactic analysis, semantic analysis, pragmatic analysis and corpus construction, and other fields. Based on a number of literatures, this paper analyses the concept and characteristics of Tibetan sentence, and carries on statistical analysis on the end form, part-of-speech and punctuation mark system of Tibetan sentences. It also studies the rules of part of speech of Tibetan language sentence boundary. Finally, this paper puts forward reasonable proposals on Tibetan sentence boundary identification method, based on sentence-final part of speech. This method is simple and effective, which tested 4,133 sentences items randomly sampled from a corpus of 35,126 sentence items, such as frequently used Tibetan sentences, Quotations of Chairman Jiang Zemin, laws and regulations, Tibetan monographs, etc. The accuracy rate is 99.98%, and the recalling rate reached 99.98%, with the F1 of 0.9998.
Keywords/Search Tags:Tibetan, natural language processing, sentence boundary identification
PDF Full Text Request
Related items