Tibetan Sentence Boundary Recognition Method Based On Sentence Part Of Speech

Posted on:2017-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:M L Z Wan

Full Text:PDF

GTID:2278330488490211

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology and network technology, natural language processing progress has developed fast in Tibetan language,and traditional Tibetan language research is inseparable from Tibetan natural language processing. Therefore, Tibetan natural processing has become an important part of Tibetan language study. Tibetan natural language processing is an interdisciplinary subject of natural science and social science, which combines Tibetan linguistics, computer science, mathematics, logic and psychology. It focuses on the study of interpersonal communication and communication between man and computer. Comparatively speaking, Tibetan natural language processing research started relatively late, involving only Tibetan characters, with a focus on Tibetan character encoding, font building and Tibetan character processing for a long period of time. Later on, the national and international standards were set. Tibetan natural processing research transited from characters to words. Tibetan word segmentation system and tagging system have developed. We have now obtained a series of innovative results, such as statistics on Tibetan word frequency.The above-mentioned research results laid a solid foundation for the Tibetan language sentence research. Currently, several scientific research institutions and scholars have begun studying on the characteristics, attributes and elements of Tibetan sentences. The premise of these studies is to be able to determine the boundary of Tibetan language sentences. Only when Tibetan language sentence boundary identification technology has developed into a more mature state, the study of Tibetan sentence characteristics, elements analysis and syntactic analysis can be accurately realized.The study on Tibetan language sentence boundary identification is of great theoretical value on lexical analysis, syntactic analysis, semantic analysis, pragmatic analysis and corpus construction, and other fields. Based on a number of literatures, this paper analyses the concept and characteristics of Tibetan sentence, and carries on statistical analysis on the end form, part-of-speech and punctuation mark system of Tibetan sentences. It also studies the rules of part of speech of Tibetan language sentence boundary. Finally, this paper puts forward reasonable proposals on Tibetan sentence boundary identification method, based on sentence-final part of speech. This method is simple and effective, which tested 4,133 sentences items randomly sampled from a corpus of 35,126 sentence items, such as frequently used Tibetan sentences, Quotations of Chairman Jiang Zemin, laws and regulations, Tibetan monographs, etc. The accuracy rate is 99.98%, and the recalling rate reached 99.98%, with the F1 of 0.9998.

Keywords/Search Tags:

Tibetan, natural language processing, sentence boundary identification

PDF Full Text Request

Related items

1	The Research Of Chinese Sentence Similarity Based On Layered
2	A Sentence Representation Method Based On Syntax And Semantic
3	Research On The Technology And Key Problems Of Automatic Video Clip And Mixing Based On Natural Language Processing
4	Research And Implementation Of Subjective Automated Assessment System Based On Natural Language Processing
5	Research Of Simple Sentence Similarity In Uyghur Language Teaching Materials Used In Primary School
6	Research Of A Kazakh Sentence Similarity Computing
7	Research On Tibetan Text Classification Technology Based On TWC?CNN
8	Studies On The Usage Of Preposition And Conjunction In Phrase Structure Syntactic Parsing
9	Research On Copying And Testing Technology Of Tibetan Text
10	Research On Chinese Sentence Compression