Font Size: a A A

Research On Key Techniques For Chinese Word Segmentation With The Combination Of Deep Learning Features And Shallow Machine Learning Features

Posted on:2018-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2348330518483392Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the advent of the Internet era,China's Internet technology continues to develop.In daily life,companies and users want to be able to obtain fast and accurate text data from computer data.For many Natural Language Processing tasks,word segmentation is often the first step in the implementation of the task,the effect of the word segmentation may affect the accuracy of the relevant tasks.There are also some problems in Chinese word segmentation which restrict the accuracy of word segmentation,such as the emergence of unknown words and ambiguous words.Through the research,the scholars have put forward a series of methods to solve these problems,there are three main methods:the segmentation method based on probability statistical model,based on dictionary matching technology and word tagging technology segmentation method.With the application of a large number of machine learning methods in the field of Natural Language Processing,scholars have proposed a hidden Markov model based on conditional random fields.When the amount of labeled data increases,the Chinese word segmentation method based on machine learning will be greatly improved compared with the traditional method.At present,depth learning has been widely used in the field of image processing,and has made gratifying achievements.In this paper,machine learning and deep learning method is applied to the field of Chinese word segmentation.This paper will be marked by word corpus to quantify,in the context of language into Long Short-Term Memory(LSTM)will add to the vector,for the next conditional random field segmentation provides sufficient context information,so as to enhance the accuracy of word segmentation.LSTM compared to the advantages of convolutional neural networks is able to retain the context dependent information,compared to the general advantages of recurrent neural networks is not easy to generate gradient diffusion and gradient explosion retention of long distance dependency information,so as to enhance the better segmentation effect support.In this paper,the proposed model is tested on the corpus provided by Beijing Language and Culture University,and the traditional model is tested on the same data set.The test shows that the fusion of deep learning characteristics and shallow machine learning features Chinese segmentation compared to the traditional machine learning word segmentation,word segmentation,word segmentation probability model annotation and dictionary segmentation result to a certain extent improve.In the word segmentation of Peking University corpus,our experimental results achieved 92.80%of the F value,which improved the effect by more than 1%.
Keywords/Search Tags:Chinese word segmentation, Machine Learning, conditional random field, neural network, LSTM
PDF Full Text Request
Related items