Font Size: a A A

Research On Lexical Analysis Based On Neural Networks

Posted on:2018-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z T YuFull Text:PDF
GTID:2348330512998165Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Lexical analysis is the basic task in Natural Language Processing.The lexical analysis task contains two sub tasks:word segmentation and Part-of-Speech tagging.The word segmentation is a task which converts Chinese character string to Chinese words.For Chinese text analysis,almost all tasks depend on Chinese word segmenta-tion.Part-of-Speech tagging is a task that specifies a part-of-speech tag for each word in the sentence.For high-level tasks such as syntax analysis and semantic analysis,POS tags can help to resolve ambiguity and alleviate the sparseness of data.Although the lexical analysis task is relatively basic,it has a very wide range of needs and application prospects.And it is still a popular research topic in Natural Language Processing.In the early stage of Chinese word segmentation task,the dictionary-based rule method was adopted because of limited computing resources and lack of annotated cor-pus.With the growth of computing power,as well as annotated corpus,the processing technique of Chinese word segmentation is slowly transferred from rule-based methods to machine learning methods.Among them,the character tagging method is the most frequently used method to solve segmentation problems.After the rise of deep learn-ing,there are also some researchers trying to use neural methods to solve segmentation problems.Part-of-speech tagging task also has a similar research path.Fisrt of all,we find traditional linear models extract features only in a limited window and cannot solve long term dependency problems.Thus,We propose to use bidirectional long short term memory networks instead of the original feature extraction module,which can save long distance information and simplify feature engineering.Secondly,we design greedy and structured segmentation models based on bidirectional long short term memory networks.Finally,we design task related word embedding models for both word segmentation and POS tagging tasks in order to solve the problem that general word embedding does not fit the specific task.The experimental results show that the segmentation model based on bidirectional long short term memory networks achieves comparable performance with traditional model.The results of simple and fast greedy model and structural model are quite close;when Word-Context character embeddings added to the segmentation model,we get state-of-art or comparable results on the standard data set.We also get better results in the filed of domain adaptation.For the POS tagging model,the ability of tagging system is also improved by the Pos-Context Sensitive embeddings.And PCS model can help us utilize heterogeneous data in a very fast way.
Keywords/Search Tags:Chinese Word Segmentation, Part-of-speech Tagging, Sequence Labeling, Word Embedding, Bidirectional Long Short Term Memory Networks
PDF Full Text Request
Related items