Font Size: a A A

Study On Cluster Analysis And Visualization Method Of Film Review Data

Posted on:2019-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:X X QuFull Text:PDF
GTID:2428330545453699Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Film has always been the broad masses of the people's popular forms of entertainment.And it's an important part of socialist cultural market.Thanks to the support from government and funds in recent years,China's film market is developing rapidly.From the perspective of the number of moviegoers,the number of films released,the number of cinemas and the number of screens,China's film market is experiencing a rapid growth.At the same time,China's film industry is shifting from focus on quantity to quality.In this context,it is important for the film industry to fully understand the needs of the audience,which brings us not only needs of collecting data of the audience's evaluation to films,but also challenges of data processing and analysis.A Better understanding of the audience's needs and emotional expression will help us create more popular movies in the future.With the development of 'Internet+' technology,there have been a large number of influential websites related to film information,such as shiguang.com,douban.com and Chinese box office.These websites accumulated a lot of basic data and review data about movies,which provides us with solid data sources for the analysis of the film industry.In this paper,some film review data were collected from Internet related websites,and the clustering analysis and visualization method of these data were studied.This paper used a web crawler targeted on shiguang.com and found the TOP25 films each year from 2011 to 2016,150 films totally,among more than 3000 films which were shown on the Chinese mainland.And then grabbed basic information of them as well as its long film review data,more than 30000 film review totally.Then the Chinese segmentation method is performed according to the natural language processing method,after which we converted them into word vectors.Then we extracted keywords which can represent "plot","image","sound" and "filmmaker"four key elements of the film,creating the keywords table reviews.And use TF-IDF method to make a supplement;perform k means clustering algorithm on the 5 dimensional data of each film review;perform correlation analysis on the clustering results,finding out their differences.After performing the linear regression to more than 38,000 long film review data of these 150 films,the key elements of the film which resulted in the separation point were studied in detail.The key elements was divided into pieces,called breakdown dimensions,in order to find out the dimension which ranks most important one for critics via PCA.In the end,a method of emotional analysis of long reviews is presented,as well as a visualization scheme based on the coordinate of sankey graph and peace line.Through the analysis and visualization of the acquired films,we can fully understand the audience's experience and feelings,and provide support for the development of the film industry.
Keywords/Search Tags:movie, data analysis, word vector, key elements, clustering, linear regression, visualization
PDF Full Text Request
Related items