Font Size: a A A

Research And Implementation On Key Technologies Of Web Text Mining Oriented To Enterprise Competitive Intelligence

Posted on:2011-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2178330332488377Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet,web resources have become an important source of enterprises competitive intelligence. However,intelligence informative text obtained from web resources is not only too large to be read and analyzed conveniently,but also there is a lot of duplicated content,all of which greatly reduce the value of information.How to use web text mining technologies to enhance the quality of information is the focus of this thesis.Through analyzing the structure of web pages,a HTML document tree parsing algorithm is designed and implement in this paper,which could be used to extract useful text content from web pages.A sub-word text is achieved by using dictionary-based statistical word segmentation algorithm and on this basis the meaningless words are removed from the text.The key words of text are extracted by using the word-based statistics and distribution of weight calculation,which is based on the analysis of the existing keyword extraction method.Considering keywords,the location of the sentence in the article and special tags and other factors,the summay is acquired automatically by the mathod of extracted sentences directly from the text.An auto-abstract algorithm based on the statistics method implements the automatic summary extraction.The duplicated texts are removed by calculating the longest common subsequence of feature-based sentence.Then an analysis of problems exsiting in SVM classifier is made in this paper,muti-classification can be divided based on the SVM classifier and binary decision tree,and SVM decision tree generation algorithm is designed and then the text Classification Mining is accomplished.Applying the technology studied in this paper,a Web text mining system to the needs of the enterprise competitive intelligence analysis and mining service system is designed and implemented.
Keywords/Search Tags:Competitive Intelligence, Text Mining, Deletion of duplicated text, Text classification
PDF Full Text Request
Related items