Font Size: a A A

Research On Uyghur Verb Phrases Automatic Chunking

Posted on:2013-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:L P Y M M T M ZuFull Text:PDF
GTID:2248330374499277Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Base phrase chunking is defined as extracting the non-overlapping segments’ from a stream of text data. It is a simple and effective preprocessing step for parsing and many tasks, such as Information Retrieval, Information Extraction, Question Analysis, can be performed adequately by identifying base phrases and the relationships between these entities. Many efforts have been paid on base XP chunking in the world and made a lot of progress already. However, Uyghur language has very little work to be done in this field until now. So it is essential to pay some effort on base phrases chunking. Base VP chunking is considered more significant and challenging in Uyghur language than both English and Chinese because of the high-level grammatical merit and importance in sentences. In order to develop a practical system of Uyghur base verb phrases chunking, this paper carried a serious research on some aspects as follows:Firstly, a summarization and review of base XP chunking is introduced in this paper, various approaches and systems applied in chunking are described too. Further investigation on the effective model as Conditional Rational Fields(CRF), and a through discovery on Uyghur language characteristics have done for the aim to creating a basis for the main work as to design and implement the Uyghur verb phrases chunking.Secondly, in order to take full advantage of word level context information, this study first time proposed a new word tagging method which is segmenting stems and suffixes with their different labels and the definition of Uyghur base verb phrases(Ubase VP) and its inner structure is introduced at the first time ever. Using these standard definitions and tag sets, morpheme-category-tagged and Ubase VP role-tagged corpus are prepared semi automatically.Thirdly, with these data applicable for training and testing, a group of experiments of Ubase VP chunking based on CRF statistical model had been performed and the results were analyzed comparatively by reviewing different features of word morpheme. The morpheme based tagging framework which was raised in this paper was outworked compared to non-segmented-morpheme tagging approach. And the outcome of performances point out context morphemes is greatly helpful to Ubase VP recognizing in text. At the same time, it is proved that discriminative model theory related to sequence labeling problems like CRF can work out well in Ubase XP chunking.There is no any detailed research on Uyghur base verb phrases chunking, so with no system to compare, it is seen the best performance ever made. However, the outcome is far from applying in real working system, and there is a huge request to this study to deepen its researching theory level and widen its applying to base phrase chunking of other categories.
Keywords/Search Tags:Verb, Base phrases, Uyghur Language, Corpus, CRF
PDF Full Text Request
Related items