Research And Implementation Of College Students' Identification Of Poor Students Based On Hadoop Platform

Posted on:2020-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:S J Zhang

Full Text:PDF

GTID:2428330623456306

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of information technology,digital campus construction has been basically realized in domestic universities.But at the same time,a large amount of data information is generated and stored.Effectively mining information about students' behavior in school is conducive to the management of university staff.The identification and funding of poor students in colleges and universities is very important for the cultivation of higher-level talents and the burden of poverty-stricken families.Accurate education and poverty alleviation can help students who are really difficult to get subsidies,successfully complete their studies at school,not lead to poverty due to education,and help students get rid of poverty is also one of the national policies.Traditional data analysis methods can no longer meet the needs of massive data analysis.And traditional data mining algorithms also have many drawbacks.Based on the Hadoop platform's method of college poverty determination,this paper analyzes the Hadoop framework and data mining algorithms,and proposes a college card consumption data and online log data as the research object,using Canopy-K-means clustering algorithm.The cluster analysis of the students' behavior in school was completed,and the poor students were selected to provide assistant decision-making for the staff identified by the poor students.This paper mainly studies the following three aspects:First,the traditional data mining algorithm in the application of the actual work may cause the mining effect to be unsatisfactory due to its defects,and the analysis of the data may be biased.In this regard,this paper proposes a Canopy-K-means algorithm.The Canopy-K-means algorithm is an improvement of the traditional Kmeans clustering algorithm proposed in this paper.It can effectively solve the problem of the initial k center points of the traditional K-means algorithm and the difficulty of handling abnormal points.It is also applied to the prediction of poor students in colleges and universities,and clusters students to find the categories of poor students,helping college staff to identify poor students.Second,the digital campus generates and stores a large amount of data information.It is difficult to complete massive log mining by single-machine data analysis.In order to process data efficiently,this paper designs and implements a data analysis system based on Hadoop platform to mine students.Behavioral datainformation.The system architecture can be divided into data preprocessing modules,data mining and storage modules.The data preprocessing module is mainly for filtering,filtering and extracting data.The data mining module mainly performs data modeling based on the features extracted by data preprocessing,and uses Canopy-Kmeans algorithm for parallel clustering mining.Out of the poor student category.Third,through the statistical analysis of the card consumption data and the online log data,it helps to help the school students manage the management of the canteen and the network.The poverty-stricken characteristic data set was extracted,and the Canopy-K-means algorithm and the K-means algorithm were used for single-machine comparison experiments.The algorithm was used to perform cluster acceleration ratio experiments on different machines.The experimental results show that the Canopy-Kmeans algorithm has good performance on a single machine compared with the Kmeans algorithm,and the performance of the algorithm can be well demonstrated in the cluster.

Keywords/Search Tags:

Hadoop, poor student identification, Canopy-K-means algorithm, data mining

PDF Full Text Request

Related items

1	Research On Distributed Parallel Data Mining Algorithm Based On Weblog
2	Design And Analysis Based On The Data Mining Technology Of Poor Students’ Identification System In Colledges
3	The Research Of Clustering Mining Based On Logistics History Data On The Hadoop
4	Research On Hot Topics Discovery In Microblog Based On Distributed K-means Algorithms
5	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
6	Based On Hadoop Data Mining Algorithm Analysis And Research
7	Research On Algorithm Of Data Mining Based On Hadoop
8	Research And Application Of Hadoop Distributed Clustering Mining Method Based On Virtual Machine
9	Research On Parallelization Of Data Mining Algorithm Based On Distributed Platforms Spark And YARN
10	Study On Key Techniques Of Distributed Data Mining Based On Hadoop