Font Size: a A A

Research On Technologies Of Web Text Mining Oriented To Enterprise Competitive Intelligence

Posted on:2012-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2248330395955561Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays,Internet has developed with the surprising rapidity. WEB, as the mainplatform of the information issuing and processing, has emerged massive informationresources. Thus, how to discover useful information and knowledge pattern and putthem to good use has been the direction of research for a long time. So, the web textmining technology was born, how to use web text mining technology to enhance thequality of information is the focus of this paper.Through analyzing the structure of web pages, a HTML document tree parsingalgorithm is designed and implement in this paper, which could be used to extractvaluable content to companies from web pages. Use a dictionary-based statistical wordsegmentation algorithm for text processing. And on this basis meaningless words in thetext were removed. On this basis of analysis of the keyword extraction method whichhas already existed, a word-based statistics and distribution method for Weighting isintroduced, which is used to achieve the extraction of keywords of text. Consideringkeywords, the location of the sentence in the article, special tags and other factors, thesummary is acquired automatically by the method of extracted sentences directly fromthe text. An auto-abstract algorithm based on the statistics method implements theautomatic summary extraction. The duplicated texts are removed by calculating thelongest common subsequence of feature-based sentence. Then an analysis of problemsexisting in SVM classifier is made in this paper, multi-classification can be dividedbased on the SVM classifier and binary decision tree, and SVM decision treegeneration algorithm is designed and then the text Classification Mining isaccomplished.Applying the technology studied in this paper, a Web text mining module to theneeds of the enterprise competitive intelligence analysis and mining service system isdesigned and implemented.
Keywords/Search Tags:Competitive Intelligence, Text Mining, Hypertext Preprocessor, Text classification
PDF Full Text Request
Related items