Font Size: a A A

Research On Named Entity Recognition Of Chinese Film And Television Scripts Based On Deep Learning

Posted on:2020-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhaiFull Text:PDF
GTID:2415330620458143Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people's habit of browsing information and watching movies at any time has formed.The rapid development of film and television industry has stimulated the creative enthusiasm of many writers,but the increase of scripts has brought a huge challenge for the film and television reviewers.Identifying accurately named entities of Chinese film and television scripts can help to filter out the specific names,places and institutions,which provides the convenience for the reviewers.In Chinese film and television scripts,there are many dialogues among characters,and almost names are placed before the colon in dialogues.According to the writing characteristics of Chinese film and television scripts,this thesis proposes a rule-based identification method.The method analyzes the text before the colon and identifies the names.The experiment is conducted on the "Soldier Assault" script,which obtains the accuracy of 97.47%,the recall of 55.49%,the F-value of 70.72%.For Chinese film and television scripts,the dialogues tend to be lifelike and colloquial,so there are many stop words.This thesis proposes a method of combining the stop words with Bi-LSTM-CRF.Firstly,the stop words in the scripts is removed.Secondly,the Bi-LSTM-CRF is used to identify named entities.The experimental results on the “Soldier Assault” corpus show that this method bring F-value increase by 26.67% compare with the rule-based method did in the names recognition.Comparing with the Bi-LSTM method based on words,the overall F-value of names,places and institutions is increased 19.04%.Due to the scarcity of corpus in the fieldof film and television scripts,but the standard corpus in the field of news is sufficient,the thesis combining the “People's Daily” corpus in January 1998 with the “Soldier Assault” corpus to experiment.The results show that when adding an appropriate amount of the “People's Daily” corpus to the “Soldiers Assault”corpus can improve the F-value of named entity recognition.
Keywords/Search Tags:Chinese film and television scripts, named entity recognition, rule-based method, stop words, Bi-LSTM-CRF
PDF Full Text Request
Related items