The Study And Application Of Web Text Data Mining Technology Based On The Approximate Pages Clustering Algorithm

Posted on:2006-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Yang

Full Text:PDF

GTID:2168360155462626

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the Internet information, the development of the data mining technology and the emergence of XML, the Web data mining technology rapidly becomes the focus in the information retrieval domain. Web data mining technology, searching engine technology, XML language, texts mining technology are systematically studies and their characteristics, principles, methods and present studying conditions introduces.Now Internet has become the main information source and people can get relevant information from it by using existing searching engines.But the information is always massive and disorder and users can difficultly obtain what they truly concern. So how to mining the documents returned by the exsisting searching engines by using Web texts data mining technology to get searching mode the users like is deeply studied.In this paper, an algorithm on clustering Web pages in view of small texts is proposed.This algorithm expresses the text characteristic by using the vector space model and clusters the vocabulary interested (users can initialize it according needs) by the users with fuzzy clustering analysis method to obtain knowledge pattern. When users search information, the repeated pages are removed by using MD5. The rest pages are clustered and ordered.Thus users can select interested pages to explore.The "precision" of information searching are improved greatly.To guarantee the "recall", this algorithm can cluster the retrieval pages comes from several search engine systems. At last, in order to put the pages cluster users more interesting former and considering the valid of the users and their interests, a data mining algorithm of Web accessing sequence based on Markov's chain and use it to order the approximate pages clusters is proposed. It has been found that this algorithm can impoved searching efficiency greatly while ensure recall and precision. Since aims at small texts data mining, its complexity of time and space is not high. So it can be said this algorithm will become one kind of practical and effective information retrieval technology.Based on the above idea, an intelligent searching engine system is designed. It runs quickly, pays attention to the "recall" and "precision"at the same time and has high searching efficiency. Now, this system has been successfully used in TW-OA.The practice shows that the result in this paper is practicable and valid.

Keywords/Search Tags:

Information searching, Web text data mining, Text clustering, approximate pages clustering, Intelligence search engine system

PDF Full Text Request

Related items

1	Text Clustering And Its Application In Web Community Search Engine
2	Research On Enterprise Competitive Intelligence Collection System Based On Web Text Mining
3	A SOM-based Text Clustering And Apply To Search Result
4	The Research And Implement Of The Meta Search Engine And The Clustering Algorithm
5	The Research And Implementation Of Modern Enterprise Intelligence Information System Based On Text Mining
6	Research On Text Mining Based Web Information Retrieval
7	Research Based On P2PKM Desktop Searching Optimization
8	Cluster Analysis Application And Research Of Text Mining
9	Based On The Content Of The Chinese Web Document Clustering Method Research And Application,
10	Research And Application On Technologies Of Text Clustering Oriented To Enterprise Competitive Intelligence