Font Size: a A A

Massive Academic Resources Classification Research For Personalized Recommender

Posted on:2018-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2348330536986033Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,information resource has been a spurt of growth.Every year there are hundreds of millions of academic resources produce,these resources can lead to huge help for the students,teachers and researchers,but the mass production of academic resources at the same time also brought the problem of resource organization and retrieval,causing a large number of high value of resources submerged.The effective ways to solve this problem is building a personalized recommendation system for academic resources based on big data technology.Help users to find really need academic resources efficiently through the way “resources take the initiative to match the user”.A outstanding recommendation system need quick access to resources,accurate classification organization and building personalized model based on user behavior.This article did in-depth research of automatic classification of mass academic resources for personalized recommendations.In this paper,combining with the characteristics of different types of academic resources we design different classification model,including single classifier model and multiple classifier model,and creatively introduce the key extension method based on collaborative filtering to solve the problem of inadequate corpus.Improving classification accuracy through making concrete analysis for concrete conditions.Based on the above analysis,this paper mainly completed the following work:(1)This paper analyzes the characteristics of thesis,patent data,and select the bayes model as the target classifier,and emphatically expounds the keyword extraction algorithm(2)This paper puts forward relevant keywords extension method based on collaborative filtering to solve the problem of news,blogs lack of learning samples.This method can increase the amount of information,so as to improve classification accuracy.(3)This article adopts the method of integrated learning in order to solve the conference title classification task.This method improved random forest classification model by changing the decision tree to the bayes in the underlying.This way both retained the stability and generalization ability of random forest,at the same time,solve the data sparseness of vector space model for decision tree.(4)This paper designed and implemented participle task of huge amounts of academic resources,related parameters extraction task of TF-IDF and training task of classification model based on the Hadoop platform,solving the problem of low efficiency by using traditional standalone mode to deal with massive text data.
Keywords/Search Tags:text categorization, bayes model, feature expansion, ensemble learning, Hadoop platfor
PDF Full Text Request
Related items