Font Size: a A A

Research And Implementation Of Key Technologies On Web Text Mining

Posted on:2010-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2178330332498589Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of the internet leading to the rapid growth of web information, web has been the largest of information source of all over the world. It is difficult to inquire out knowledge because of the fast expansive web resource as well as the separation and the heterogeneity of the web information. Now internet has been a distributed information space with billions of web pages, from which conceals a large number of knowledge that people urgently needed. It is one of research direction that how to extract knowledge from plentiful web information, bring a new research field named web data mining. Web text mining is one branch of web data mining. Web text mining can improve efficiency people obtain web information, divide web text into different classifications, help people to locate knowledge they want quickly and extract worthy knowledge from web resource.This paper research and use web text mining technology to dig out web information, realize classifications to web text, design and achieve web text mining prototype system with total function and structure in detail, discuss current popular key technologies to web text mining primarily. The paper's chief research contains:1. Introduce background knowledge of web text mining, analyze research backdrop and actuality of web text mining and discuss its significance.2. Analyze and discuss the key technologies of web text mining process, which contains Chinese word segmentation, weight calculation method, feature representation and extraction approach to web text.3. Discuss some common web text classification and clustering algorithms:KNN nearest neighbor algorithm, naive bayes method, self organizing maps algorithm and K-means algorithm, research the advantages and disadvantages of K-means algorithm and genetic algorithm. Based on the research, this paper puts forward a clustering algorithm named clustering based on K-means and genetic algorithm, carries out the experiment to verify the feasibility of the algorithm foundation on mining evaluation methods.The paper describes design and realization details of web text mining prototype system based on research.
Keywords/Search Tags:Data Mining, Web Text Mining, Chinese Word Segmentation, Feature Selection
PDF Full Text Request
Related items