Font Size: a A A

Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model

Posted on:2016-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2308330461978680Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer science and web technology, computers become an indispensable tool for human life. How to communicate with the computers effectively gets more attention and thus emerges NLP, natural language processing, technology to process natural language with computers. POS tagging, as a much more basic issue, is a preprocessing task that plays vital role in the follow-up research of all these NLP tasks. Thus, research on POS tagging is of great significance.This paper proposes to improve the tagging precision rate of HMM based POS tagging from the following aspects:First, semi supervised learning on a small scale corpus is used to enlarge the corpus by iteration. This can improve the HMM tagging especially improve the tagging precision rate on corpus of different field. Secondly, word similarity is used to improve the tag of OOV. For OOV words of high frequency, find out the similar words and use their pos tags as candidate tags; for OOV words whose frequency is less than ten, find out the similar contexts by searching the similar strings, then use the tags of the words in the similar context to tag the OOV word. Thirdly, in HMM tagging, choose two best paths and do twice choosing to get the tagging result.The experimental result shows that the proposed method outperforms the traditional HMM-based POS tagging by 2.6%, and the tagging precision rate reaches 95.65%.
Keywords/Search Tags:Part-of-speech Tagging, word vector, word similarity, iterative training
PDF Full Text Request
Related items