Font Size: a A A

Research On Chinese Word Segmentation For Food Safety Emergencies

Posted on:2018-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330575967108Subject:Agriculture
Abstract/Summary:PDF Full Text Request
With the development of society and the improvement of people's living standards,people pay more and more attention to the problems of food safety,which is closely related to public health and life safety.Food safety accidents are emerging constantly,and could not be stopped,something related to food safety is the main reason for why the problems of food safety can't be solved in the fundament,and the establishment of food safety incident-related database can promote food safety information into the open,transparent,which can not only help the supervision of food safety effectively,but also allow consumers to understand the accurate information that related to food safety.The main object of this paper is the Chinese word segmentation method for food safety emergencies.In Chinese natural language processing,word segmentation is often the first step,and the accuracy of word segmentation has a significant impact on the development of follow-up tasks.Food safety emergencies of nearly 5000 pieces of data,a total of 2033539 words were collected,and then the acquisition of data preprocessing,the necessary operation before the entry is the word segmentation.In the current methods of Chinese word segmentation,the word segmentation method based on word annotation statistics is the mainstream,and the word segmentation method based on the conditional random field model is the best.In the third chapter of this paper,we focus on the problem about the feature selection and model optimization in the word segmentation method based on conditional random field model,analyze the characteristics of word length distributionof the food safety emergencies corpus,and experiment with different feature selection and feature template,then got the effect of different feature selection and the application of different feature templates on the result of word segmentation.In the experimental results,only features plus position feature selection 4Tag and 5Tag segmentation effect is good,the F-score reached 92.87%and 92.88%,and add other features,F-score decreased;whenthe binaryfeature such asC-1C0,C0C1,C-1C1 was removed in different feature templates in the experiment,F-score dropped to 86.33%,compared to the original feature template by its F-scoreis 6.55%,and increase the feature template when F-scoredid not change significantly.Deep learning does not require artificial design features,to achieve end-to-end output,apply it into the task of Chinese word segmentation,by deep learning the deep and complex network,we can dig and learn the internal information of the text,to grasp the whole text to get better segmentation result.The bidirectional long short-term memory neural network mode can obtain the contextual information in the sentence well,and solve the long-range dependency problem in the training process to a certain extent.In the experiment of the fourth chapter,it did a try about the Chinese word segmentation that based on this model,through the adjustment of the parameters in the training process,its F-score finally reached stable at 94.56%,the results of word segmentation exceeded the word segmentation method that based on the conditional random field model in the third chapter.With the development of the Internet and the characteristics of food safety,such as"swelling agent","dispensing shrimp ","zombie meat "and other terms and network hot words often appear in the corpus,the CRF model and deep learning model can better solve these problems in unknown words for food safety emergencies the event Chinese segmentation,get good segmentation effect.From the point of view of the Chinese word segmentation of food safety emergencies,the deep learning is better than the CRF model.Based on the training data,model of computing resources and training tools,time,experimental results of the two comparison,analyses the respective advantages and disadvantages,their views on the development of Chinese after word segmentation method is proposed.
Keywords/Search Tags:Food safety, Chinese word segmentation, Conditional random field, Deep learning
PDF Full Text Request
Related items