| This system aims to collect cut in almost all of the animation works within the scope of the data,build a comprehensive,complete and extensive data system.At the same time based on the data system for data analysis,a large number of works,topics of excellent animation and comics,and thus provide investors with efficient investment advice,subject matter for original authors suggest,solve the film investors and the original author’s broad appeal.This system through the data collection tools,according to different web page structure,design of data acquisition rules,rules were designed more than 1000 data collection and collect the entire network within the scope of the animation and cartoon figures,including 19 comic sites,10 animated web site.This article selects the moderate difficulty of data acquisition rules,this paper discusses the process of building data acquisition rules,interspersed with work experience,in the process possible problems,and in view of the problems how to design rules.In addition,a variety of design ideas are compared,and the advantages and disadvantages of different designs are discussed.Finally,the optimal rules design scheme is chosen.This system has accumulated a large number of data collection rules,how to arrange the data collection task reasonably,and make full use of the limited data cloud to collect cluster nodes,which has become an important issue.This system on the issue,the data acquisition task scheduling functions were described in detail,including how to set the timing acquisition task,in which the strategy choice task on time,how to arrange order key problems such as data acquisition task.Task scheduling functions,the realization of the open task time for the mass data acquisition rules of planning and control,to avoid the backlog of data acquisition task,improve the utilization efficiency of the collection data cloud cluster nodes,avoid the waste of resources.This system due to large amounts of timing data acquisition tasks running in the cloud,lead to the cloud database has a lot of new data,how reasonable the cloud data,in a planned way of import local database become an important problem.This system automatically by the data export function,intervals of an hour from the cloud database data import local mysql database,the function to avoid a lot of,artificial export data of repetitive work,thus avoiding the large amount of data data import local mysql database at the same time,the block of data,ease the pressure on the local database.The mysql database of this system stores a large number of primitive data of simple structure.These raw data is not sufficient to fully support the requirements of data demanders.And the data analysis function of this system realizes the analysis and processing of the original data in the mysql database.Data analysis,the core of the algorithm model based on entropy weight method,the standard deviation method,approximate ideal point method,grey correlation method on the basis of building,in the process of check,by combining different algorithms,set up numerous data analysis model,through comparing the results of the analysis of each data model,select the best data model,this model as a regular script execution data analysis work,and after analyzing the data back to the database,for data needs it to read and use the data.At the same time,the text analysis of the comic book reviews of fans readers is carried out,and the key words of the words cloud and corpus are drawn to summarize the views of fans’ readers on each work,which can effectively screen out the works popular with fans. |