Research On Key Techniques For Chinese Word Segmentation With The Combination Of Deep Learning Features And Shallow Machine Learning Features

Posted on:2018-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhou

Full Text:PDF

GTID:2348330518483392

Subject:Computer application technology

Abstract/Summary:

In recent years,with the advent of the Internet era,China’s Internet technology continues to develop.In daily life,companies and users want to be able to obtain fast and accurate text data from computer data.For many Natural Language Processing tasks,word segmentation is often the first step in the implementation of the task,the effect of the word segmentation may affect the accuracy of the relevant tasks.There are also some problems in Chinese word segmentation which restrict the accuracy of word segmentation,such as the emergence of unknown words and ambiguous words.Through the research,the scholars have put forward a series of methods to solve these problems,there are three main methods:the segmentation method based on probability statistical model,based on dictionary matching technology and word tagging technology segmentation method.With the application of a large number of machine learning methods in the field of Natural Language Processing,scholars have proposed a hidden Markov model based on conditional random fields.When the amount of labeled data increases,the Chinese word segmentation method based on machine learning will be greatly improved compared with the traditional method.At present,depth learning has been widely used in the field of image processing,and has made gratifying achievements.In this paper,machine learning and deep learning method is applied to the field of Chinese word segmentation.This paper will be marked by word corpus to quantify,in the context of language into Long Short-Term Memory(LSTM)will add to the vector,for the next conditional random field segmentation provides sufficient context information,so as to enhance the accuracy of word segmentation.LSTM compared to the advantages of convolutional neural networks is able to retain the context dependent information,compared to the general advantages of recurrent neural networks is not easy to generate gradient diffusion and gradient explosion retention of long distance dependency information,so as to enhance the better segmentation effect support.In this paper,the proposed model is tested on the corpus provided by Beijing Language and Culture University,and the traditional model is tested on the same data set.The test shows that the fusion of deep learning characteristics and shallow machine learning features Chinese segmentation compared to the traditional machine learning word segmentation,word segmentation,word segmentation probability model annotation and dictionary segmentation result to a certain extent improve.In the word segmentation of Peking University corpus,our experimental results achieved 92.80%of the F value,which improved the effect by more than 1%.

Keywords/Search Tags:

Chinese word segmentation, Machine Learning, conditional random field, neural network, LSTM

Related items

1	The Key Technology On Chinese Word Segmentation Based On Bi-LSTM-CRF Model
2	Research On Chinese Word Segmentation Based On Deep Learning
3	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
4	Research On Chinese-Braille Translation Of Word Segmentation And Link Writing
5	Research And Application Of Chinese Word Segmentation Method Based On Conditional Random Field
6	Research And Implement Of Chinese Word Segment Techniques Based On The Conditional Random Field
7	Research And Implement Of Chinese Word Segment Techniques Based On The Conditional Random Field
8	Research On Chinese Word Segmentation Method Based On Statistical Learning
9	Research Of Chinese Word Segmentation With Conditional Random Fields
10	Research On Chinese Word Segmentation For Food Safety Emergencies