Font Size: a A A

Web Data Mining Applied In The Teaching Resources Search Engine

Posted on:2008-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2178360215979111Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In this paper,we explore a special plan about a Chinese personalized education resources search engine facing the need property of high elementary education. With the increasing development of the users and the teaching resources database, it will become very important about how to understand the query need of the user considerately and return the results which can reflect the connotation and the extension of the query need, how to return and inquire about the result of asking for intension and epitaxy as accurate as possible; Return to users by way of cluster in page how to inquire about the condition with it in a situation that users inquire about the condition and does not confirm very much to similar to, it is the search engine under the existing condition that studies the problem that should be set about solving hard.This paper mainly finished the research about the Web data mining which is used in search engine.There are three main contents of Web Data Mining:Content Mining, Structure Mining and Usage Mining. Web Structure Mining is to deduce some kind of knowledge from structures of WWW, Web documents and hyperlinks. As for Search Engine, We can establish a linking structure pattern by analyzing quantity and targets of a web page or website's in-links and out-links. By studying such algorithms based on hyperlink as PageRank, we can guide our linking optimization and continuously improve the website's rank, avoid dealing with the confused result caused blindly. This paper has concentrated on studying the computing technology , webpage of this algorithm and chaining the impact on PageRank value of the structure to PageRank algorithm of the mainstream at present mainly, and analyse in independent websites , result including inbound chaining and setting off under several kinds of models , such as chaining ,etc. of algorithm, have put forward the corresponding optimization tactics. Pass the pluses and minuses of summarizing PageRank finally, drift about to the theme among them the phenomenon provide PageRank algorithm after improving, and has proved to it.When mentioned Content Mining, the work that the author does is to study how to utilize Web to excavate technology, and combine existing cluster technology , realize the classification and cluster of the high rate of accuracy to Web text data. This thesis proposes excavating through Web content excavating and structure, draw the level classification information of webpages in the whole website, carry on the cluster to the webpage through these level classification information. The author tries to introduce the concept of the suffix tree (Suffix Tree Clustering ) and carry on the dynamic cluster to the page, this is a kind of new cluster algorithm. For the linear time operation methods of a kind of novel one , increment type, the data structure that this algorithm produces is very compact, have economized a large number of memory spaces. Basic one bunch of problems of character that suited to solving very much, the proposition of this method , have offered a new thinking for the research that the Chinese text excavated the field in the future.
Keywords/Search Tags:PageRank, Search engine, Web DataMining, Text Classification
PDF Full Text Request
Related items