Font Size: a A A

The Research And Application In The Stock Market News Of Feature Selection And SVM Algorithm

Posted on:2015-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:C L WenFull Text:PDF
GTID:2268330428965555Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In today’s information age, with the continuous improvement of growth and doubling the storage capacity of the network communication speed, mass text message transmission and preservation has become egregious usual. It is clear that traditional access to information technology has not kept pace with the times, how timely the information we want to retrieve the text is becoming increasingly prominent in the fast-growing information text database. Text mining is precisely to address this demand in recent years has been rapid development. Text mining involves many aspects of text clustering, text classification, information extraction, text classification which is one of the current research in the field of data mining hotspots content. Currently, text classification technology has been successfully applied to many fields, such as spam filtering technology used by the mail server, web search engine companies use search technologies.Text classificationis in accordance with the classification rules for certain types of text divided unknown category of classification rules here refers to the text to distinguish the characteristics of other text information. In order to achieve automatic text classification, it takes into digital text classification rule classifier, which is to be determined by the classification of the text category. Text classification is a supervised machine learning, that is, before the training sample classification category is known.Typically, text classification process includes pretreatment text information, extract text feature item, text classifier generation, text classification performance test, the classification results evaluation steps. Which extracts text and text classification is an important feature items generated contents of this paper, different feature extraction methods and classification algorithms choose different performance categories will have a greater impact.This paper describes the background and meaning of the text classification study research status. Compared with other countries, domestic text classification research started late, but with the Internet technology has made a breakthrough in the development, text classification technology is increasingly taken seriously at all levels of the country, all kinds of computer research institutions, and developed a series of line Chinese text classification techniques.Then the text classification techniques were related summary description, including pre-text, text feature extraction and classification algorithms items. For deficiencies calculated using the TF*IDF weights feature words, the proposed TF*IDF weight calculation method of improvement. Meanwhile, in order to effectively reduce training time and space text complexity, but try to reduce the impact on the classification accuracy of dimensionality reduction, this paper introduces the concept of correlation characteristics, and use the correlation coefficient to measure the characteristics associated with the degree of inter-feature items. When the feature correlation coefficient greater than the agreed threshold, using the feature item to replace the secondary level to reduce the characteristic feature items synonyms or near-term focus on justice redundancy feature items.Then how to solve the multi-class SVM classification algorithm conducted a study in which binary classification method most widely used, but different binary classification structure will produce different results. When generating a binary tree structure present, is usually based on the distance between the distribution of the samples to determine the class or the sample position in the tree node. This paper proposes an improved method, that when generating multi-class classification considering the binary distribution from the sample classes and categories between better through improved promotion of experimental analysis and comparison of the performance of the algorithm.Finally, the design of a text classification in the stock market industry information on the automatic classification system, feature extraction module used in this article to improve the TF*IDF weight calculation method and dimension reduction methods; using the proposed improvements based on binary tree SVM classification module multi-class classification. The course of this study failed to explore in depth the issues, and pointed out the direction for further research.
Keywords/Search Tags:text classification, support vector machine, binary tree, term frequency, feature items
PDF Full Text Request
Related items