The Intelligent Storage And Mining Of Big Scholarly Data Based On Distributed Architecture

Posted on:2019-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Luo

Full Text:PDF

GTID:2417330590967384

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Scientific research is the strategic support of improving the productive forces of society and the comprehensive national strength of the country.In the world,millions of knowledge literatures are produced every year in computer science,basic science,medicine,economics and sociology.At the same time,with the rapid development and popularization of the Internet,the dissemination and sharing of knowledge literature has become very easy,thus entering the era of great academic data.In the face of such a vast academic information resource,how to store and mine it intelligently is a very important work.It mainly involves three applications of computer science,including database system,distributed computing and machine learning.This subject regards the academic search system Ace Map(also called Paper Book)as the research object,and stores the academic entities and their logical relationships by designing relational data table.It then proposes two optimization approaches to tackle the bottleneck of SQL query performance(according to the system environments of traditional relational database and distributed architecture respectively).Finally,it explores the applications of distributed machine learning framework in Ace Map.The main contributions of this dissertation include:· Utilized the Window Functions mechanism(Partitioning、Ordering、Framing)to optimize a large number of analytical SQL queries existed in the Ace Map system.The experimental results show that the optimization can improve the performance of the system to a certain extent,and can reduce the execution time of the query by 18.6 percent to the extent.· Completed the synchronous migration of some large academic data to the Hadoop Distributed File System,and applied the SQL-on-Hadoop technology framework Spark SQL to perform complex queries.At the same time,the parameters of the Spark cluster(Spark executors concerned)have been tuned based on the data set volume and the architecture of the distributed cluster.The experimental results show that the optimization can greatly improve the performance of the system,and can reduce the execution time of the query by 93.9 percent to the extent.· Applied the distributed machine learning framework Spark MLlib to mine academic topics,which has expanded and improved the ability of Ace Map system knowledge discovery.

Keywords/Search Tags:

Relational Database, SQL, Window Functions, SQL-on-Hadoop, SparkSQL, Machine Learning

PDF Full Text Request

Related items

1	Application Of Machine Learning Classification Algorithm In Chinese Databases
2	Scientific Research Evaluation Of Universities In China Based On DEA Window Analysis Model With An Optimal Window Length
3	Applied Research On The "Window Of Learning And Thinking" Column Of "Outline Of Chinese And Foreign History"
4	Research On The Design Of Learning Resource Distribution Model Based On Hadoop
5	The Research On Students’ Online Learning Behavior Based On Hadoop
6	The Design And Application Of University Teaching Resource Database System Based On Cloud Platform
7	Research On The Application Of The Column "Window Of Learning And Thinking" In "Outline Of Chinese And Foreign History" Under The Core Literacy
8	Design And Implementation Of University Academic Warning System Based On Hadoop
9	A Study On The Strategy Of Cultivating National Feelings In The "Window Of Learning And Thinking" Of The Unified Compilation Of High School History Textbooks
10	The Improvement And Application Of The Extreme Learning Machine Algorithm