Font Size: a A A

An Analysis Of The Tendency Of Internet

Posted on:2016-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:B B ZhangFull Text:PDF
GTID:2208330470964916Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Stock comments text classification is an application of text categorization in the stock field, which not only have the general text classification futures, but also have some specials. This paper mainly talk about general text data mining on the subjective aspects and made a series of in-depth study to realize the stock comments text system of mining. These specific designs are as follows:(1) On the construction of stock dict. Integrated these methods of measure the correlation by the mutual information, presented an improved mutual information measure algorithm between new word and the seeds, and then artificial selection. Finally, extended a dict contains stock tendency words.(2) On the pre-processing. Designed and realized customized web crawler for this research algorithm; and proposes bidirectional maximum matching algorithm based on binary search dictionary, the efficiency have great change. So implements a fast bidirectional maximum matching word segmentation algorithm.(3) On the features selection. Combined with the analysis and implementation of various common features selection algorithms, proposes an improved chi-square statistical algorithm. This algorithm not only identify chi-square with the distinction features to retain the advantages, but also avoid the influence of low frequency of feature. By the related experiments proved improved chi-square algorithm have great effectiveness in web stock comments classification.(4)About the models. Combined with analysis the theory of SVM algorithm, an improved SVM algorithm proposed by eliminating part of training data which may be noise points to avoid preventing over fitting occurs.On the one hand, this article as the foundation of theoretical algorithm with the text data tendency mining have mathematical forms,derivation. On the other hand, it emphasizes its application value through a lot of practices in engineering research. Not only described some details theoretical framework on text mining, but also described some important algorithms which have innovative.Finally, By experiment verified the machine automation tendency analysis and artificial reading analysis results have consistency highly.
Keywords/Search Tags:stock, text classification, the tendency of text, SVM
PDF Full Text Request
Related items