Font Size: a A A

Research And Application Of Data Mining Based On Web Literature

Posted on:2012-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z P GongFull Text:PDF
GTID:2218330338966278Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of higher education, the number of university students has been increased from hundred thousand to several million during the past few years, the government will provide substantial fundings, and thus a large number of research projects are generated each year. Due to the accumulation of a large number of Web documents, it is difficult to find useful information from the mass of literature data, let alone improve the efficiency. The main purpose of this thesis is to find useful information from a large number of literature data for further guidance by using data mining technology.To find data mining algorithms suited for a large number of literature datas, firstly, this thesis gives a brief introduction to theoretical knowledge of data mining, and gives a general similarity calculation process and formula of the text, where we present an analysis of several clustering algorithms and find some deficiencies. According to the principles of clustering effect sassessment and the thinking of incremental clusterings, we design a cohesion-based incremental clustering algorithm, which makes up the deficiency of several above-mentioned algorithms. Then the parameters of the clustering algorithm are optimized by some relevant experiments. By referring to relevant literatures and analysizing the test results of PaperPass software, a method for caculating the similary degree is obtained, which contributes to the examination of the phenomenon of plagiarized documents. Moreover, the algorithem of calculating the similarity degree is improved based on the way of space vector. Finally, according to the relevant knowledge of the web cralwer, a literature focused crawler is designed and implemented so as to obtain an overwhelming of web documents data.In order to apply the above-mentioned algorithms and provide users with an operational platform, a Web-based data system of data mining is designed. This paper analyzes the goal and characteristics of the system, and selects the relevant technical line, then completes the system structure, function and division of main modules's divide, and finally designs the system database. In the end, the methods of the operation and deployment for our system are given, and the demos of some relevant functions are presented.
Keywords/Search Tags:Data mining, Incremental clustering algorithm, Literature focused crawler, Text Similarity
PDF Full Text Request
Related items