Font Size: a A A

Anomaly Detection Method Based On Improved Dbscan Algorithm And K-S Test

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:X MaFull Text:PDF
GTID:2518306338487004Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The audience rating reflects the popularity of a program,and is a key reference index for program programming and advertising.It has an important economic and social impact on the audio-visual ecology.In the era of low ratings,a small amount of sample data pollution caused by subjective or objective reasons can bring subversive changes to the ratings results.As a result,the credibility of the ratings has been questioned.In recent years,with the development and application of big data technology in the radio and television industry,it has become possible to calculate the accurate audience rating based on full viewing data.However,although the reliability of audience rating calculation has been achieved at the national level,it still cannot eliminate the possibility of data source pollution of audio-visual operators.Aiming at the credibility problem of the source side of the operator side of the viewing data,this thesis proposes a method of anomaly detection of the source side data by comparing the sampled data set with the reported data set.First of all,the framework of anomaly detection for source-side data is proposed.By means of data probe,a large data sampling data set is randomly and randomly formed on the operator side.Data cleaning and pretreatment are carried out on the sampling data set and the operator reported data set to obtain the standardized representation of the user's viewing behavior in a certain time domain.Through two aspects of clustering dimension and statistical dimension,the two data sets are analyzed to judge the differences of the data sets.Secondly,in the clustering dimension,the KNNP-DBSCAN algorithm is proposed to solve the problems of the traditional DBSCAN algorithm in parameter selection and low time efficiency.The KNNP-DBSCAN algorithm can automatically determine the optimal parameters,and realizes the improvement of parallelization based on meshing.Through experiments and simulations on the commonly used artificial data sets and viewing data sets,it is proved that the algorithm in this thesis can guarantee the stability of clustering effect and has a higher performance advantage.Thirdly,in the statistical dimension,standard statistical analysis and distribution statistical analysis were conducted respectively.In the standard statistical aspect,the degree of difference of the data set was evaluated from the overall condition of the data set,and the expected value,standard deviation,skewness and kurtosis were calculated.In terms of statistic analysis,according to the characteristics of viewing the distribution of the dataset and gathering based on fixed interval and weighting function improved testing method,double sample K-S for data discretization said samples,through experiment show that the method has high effect on ratings data set,can be used to judge the distribution difference of the two data sets.Finally,the effectiveness of the anomaly detection method proposed in this thesis is verified by experiments on the viewing data set.
Keywords/Search Tags:Abnormal detection of viewing data, DBSCAN algorithm, two sample K-S test, data set comparison
PDF Full Text Request
Related items