Font Size: a A A

The Research And Application Of Approximate Resource Detection Technology In Education Website

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2248330371994597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology in education, the theme database of educational website showed a mass characteristic, more and more approximate resource was filled with theme database. It has caused great inconvenience to the users. This paper was guided by text mining theory, combined with software engineering and focused on filtering out the approximate resources from the theme database of education website in a efficient, accurate and comprehensive way.First of all, it described the theory used by subject, summarized the concept of text mining, processing, application fields and prospects for development, then focused on analyzing clustering algorithm in text mining deeply. Secondly, combining short text characteristics of themes in the education website, it analyzed the shortcomings of traditional clustering methods, given this, improved the clustering algorithm from the perspective of the feature extraction, similarity measurement, adaptive threshold setting and proposed the parallel clustering algorithm based on searching short themes text. Thirdly, the improved algorithm was applied to check similarity system, and displayed the system performance. Finally, it used a number of advanced technologies around the word tokenizer selection, the challenges of massive data, high concurrency processing to enhance system performance. The experimental result showed that the improved algorithm had a better similarity-checking performance.This paper used the K-Means, DBSCAN and HierachicalCluster algorithm. Firstly, it improved the feature extraction, the key points of clustering, by adding the knowledge point, the coefficient of difficulty, theme type distance and so on. Because of increasing the feature window, the improved feature extraction method could be more effectively identified themes; Secondly, it changed the traditional similarity computing method by joining the hierarchy coordination factor to adapt to the new feature extraction; Thirdly, it used the idea of math expectation to improve the setting of the threshold in the clustering algorithm, effectively reducing the noise and outliers points; Finally, clustering could be performed in parallel on a hadoop cloud platform, improving the speed of the large amount of data clustering.The research of this subject could cluster short theme text efficiently and accurately, the check similarity system designed and achieved by this paper met the goal of checking approximate resources, and could clear the similar theme from theme database regularly and automatically. This system was used in educational web site, reducing the labor amount of discipline editor, improving the theme quality and facilitating the user.
Keywords/Search Tags:Text Mining, Clustering, Natual Language Processing, Similar Resources, Theme Similarity-checking
PDF Full Text Request
Related items