Research And Application Of Data Mining Based On Web Literature

Posted on:2012-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Gong

Full Text:PDF

GTID:2218330338966278

Subject:Education Technology

Abstract/Summary:

PDF Full Text Request

With the development of higher education, the number of university students has been increased from hundred thousand to several million during the past few years, the government will provide substantial fundings, and thus a large number of research projects are generated each year. Due to the accumulation of a large number of Web documents, it is difficult to find useful information from the mass of literature data, let alone improve the efficiency. The main purpose of this thesis is to find useful information from a large number of literature data for further guidance by using data mining technology.To find data mining algorithms suited for a large number of literature datas, firstly, this thesis gives a brief introduction to theoretical knowledge of data mining, and gives a general similarity calculation process and formula of the text, where we present an analysis of several clustering algorithms and find some deficiencies. According to the principles of clustering effect sassessment and the thinking of incremental clusterings, we design a cohesion-based incremental clustering algorithm, which makes up the deficiency of several above-mentioned algorithms. Then the parameters of the clustering algorithm are optimized by some relevant experiments. By referring to relevant literatures and analysizing the test results of PaperPass software, a method for caculating the similary degree is obtained, which contributes to the examination of the phenomenon of plagiarized documents. Moreover, the algorithem of calculating the similarity degree is improved based on the way of space vector. Finally, according to the relevant knowledge of the web cralwer, a literature focused crawler is designed and implemented so as to obtain an overwhelming of web documents data.In order to apply the above-mentioned algorithms and provide users with an operational platform, a Web-based data system of data mining is designed. This paper analyzes the goal and characteristics of the system, and selects the relevant technical line, then completes the system structure, function and division of main modules's divide, and finally designs the system database. In the end, the methods of the operation and deployment for our system are given, and the demos of some relevant functions are presented.

Keywords/Search Tags:

Data mining, Incremental clustering algorithm, Literature focused crawler, Text Similarity

PDF Full Text Request

Related items

1	Research Of Focused Crawler About Group Of University Website Based On RSS
2	Focused Crawler Based On Incremental Bayes Algorithm
3	Research And Implementation Of On Semi-automatic Ontology Construction Base On WordNet And Focused Crawler
4	Research On Search Strategy And Key Techniques Of Focused Crawler
5	Research And System Realization On Focused Web Searching And Mining
6	Research On Topic Focused Web Crawler And Related Technologies
7	Research And Implement Of Distributed Focused Crawler
8	Research On Topic Web Crawler For Web Text Mining
9	Study On Similarity-based Text Clustering Algorithm And Its Application
10	Research And Implementation Of Focused Crawler Oriented To Engineering Technology