Font Size: a A A

Thematic Analysis Of Chinese Film Scripts

Posted on:2020-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:J Q XueFull Text:PDF
GTID:2415330620958150Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,the network has gradually become an important way for people to obtain news.Various types of text data on the Internet have exploded,and the growth of Chinese film and television scripts is particularly prominent.At present,the relevant research on text categorization has made breakthrough progress,but the classification of Chinese film and television scripts mainly relies on manual experience,which has the characteristics of high cost and low efficiency.At present,there is no research on the automatic classification of Chinese film and television scripts.This paper will study the topic extraction and script classification.Based on the topic generation model,this paper proposes a new hybrid model by using natural language processing technology and machine learning algorithm.A hybrid model of support vector machines using LDA algorithm combined fusion kernel function.Traditional topic generation model relies on the similarity of documents and paragraphs,paragraphs and statements,statements and words,but ignores the similarity between text statements and statements.However,the LDA algorithm can make up for this shortcoming by analyzing the similarity between statements and between words.Therefore,this paper measures the similarity between script semantics through the LDA algorithm.Due to the large amount of data and sparseness of the script,Firstly,establish the weighted matrix of script words by using the TF-IDF,meanwhile,reduce the vector space dimension of the sample set with the ISOMAP method;Secondly,propose the algorithm model of cross entropy combined with confusion,and then determine the optimal number of topics that LDA needs to extract;Thirdly,use the LDA algorithm to mine the implicit keywords of the script through the script-theme method;finally,classify topics using the proposed fusion kernel(polynomial kernel,conditional positive definite kernel and Gaussian kernel)function SVM,So as to improve the generalization ability of the algorithm and the precision of topic extraction.In order to prove the performance of the LDA algorithm combined with SVM,this paper uses the classic 317 scripts as experimental data,and compares it with the text classification algorithms of different kernel functions of KNN,Bayesian and SVM.The results show that the proposed LDA-SVM algorithm can efficiently classify film and television scripts.The final classification precision of the script topic can reach more than 95%,and its classification performance is better than KNN,Bayesian and ordinary SVM classifiers.
Keywords/Search Tags:Chinese film script, ISOMAP, Dimension reduction, LDA, SVM, Kernel function fusion, Feature extraction
PDF Full Text Request
Related items