Research On Uyghur Verb Phrases Automatic Chunking

Posted on:2013-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:L P Y M M T M Zu

Full Text:PDF

GTID:2248330374499277

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Base phrase chunking is defined as extracting the non-overlapping segments’ from a stream of text data. It is a simple and effective preprocessing step for parsing and many tasks, such as Information Retrieval, Information Extraction, Question Analysis, can be performed adequately by identifying base phrases and the relationships between these entities. Many efforts have been paid on base XP chunking in the world and made a lot of progress already. However, Uyghur language has very little work to be done in this field until now. So it is essential to pay some effort on base phrases chunking. Base VP chunking is considered more significant and challenging in Uyghur language than both English and Chinese because of the high-level grammatical merit and importance in sentences. In order to develop a practical system of Uyghur base verb phrases chunking, this paper carried a serious research on some aspects as follows:Firstly, a summarization and review of base XP chunking is introduced in this paper, various approaches and systems applied in chunking are described too. Further investigation on the effective model as Conditional Rational Fields(CRF), and a through discovery on Uyghur language characteristics have done for the aim to creating a basis for the main work as to design and implement the Uyghur verb phrases chunking.Secondly, in order to take full advantage of word level context information, this study first time proposed a new word tagging method which is segmenting stems and suffixes with their different labels and the definition of Uyghur base verb phrases(Ubase VP) and its inner structure is introduced at the first time ever. Using these standard definitions and tag sets, morpheme-category-tagged and Ubase VP role-tagged corpus are prepared semi automatically.Thirdly, with these data applicable for training and testing, a group of experiments of Ubase VP chunking based on CRF statistical model had been performed and the results were analyzed comparatively by reviewing different features of word morpheme. The morpheme based tagging framework which was raised in this paper was outworked compared to non-segmented-morpheme tagging approach. And the outcome of performances point out context morphemes is greatly helpful to Ubase VP recognizing in text. At the same time, it is proved that discriminative model theory related to sequence labeling problems like CRF can work out well in Ubase XP chunking.There is no any detailed research on Uyghur base verb phrases chunking, so with no system to compare, it is seen the best performance ever made. However, the outcome is far from applying in real working system, and there is a huge request to this study to deepen its researching theory level and widen its applying to base phrase chunking of other categories.

Keywords/Search Tags:

Verb, Base phrases, Uyghur Language, Corpus, CRF

PDF Full Text Request

Related items

1	Qualifying The Verb Phrases And Noun Phrases In The Field Of Semantic Analysis
2	The Study On Basic Elements To Build Statistical Language Model Of Uyghur
3	The Establishment And Application Of Uyghur Speech Corpus Based On Online
4	Research Of Uyghur N-gram Model And Smoothing Algorithm
5	Research On The Construction Of Uyghur Text Corpus For Sign Language Information Processing
6	Research On Entity Recognition Of Person Names In Uyghur Text Corpus
7	Research And Implementation On Uyghur Language Semantics Search Engine Based On Ontology
8	Research And Application Of Uyghur-chinese Machine Translation Model Based On Deep Learning
9	The Construction And Research Of Chinese-uyghur Bilingual Comparable Corpus Automatic Acquisition System Based On Machine Translation
10	Research On Real Corpus Faced Automatic Acquisition Of Chinese Verb Subcategorization