The Research And Application In The Stock Market News Of Feature Selection And SVM Algorithm

Posted on:2015-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:C L Wen

Full Text:PDF

GTID:2268330428965555

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In todayâ€™s information age, with the continuous improvement of growth and doubling the storage capacity of the network communication speed, mass text message transmission and preservation has become egregious usual. It is clear that traditional access to information technology has not kept pace with the times, how timely the information we want to retrieve the text is becoming increasingly prominent in the fast-growing information text database. Text mining is precisely to address this demand in recent years has been rapid development. Text mining involves many aspects of text clustering, text classification, information extraction, text classification which is one of the current research in the field of data mining hotspots content. Currently, text classification technology has been successfully applied to many fields, such as spam filtering technology used by the mail server, web search engine companies use search technologies.Text classificationis in accordance with the classification rules for certain types of text divided unknown category of classification rules here refers to the text to distinguish the characteristics of other text information. In order to achieve automatic text classification, it takes into digital text classification rule classifier, which is to be determined by the classification of the text category. Text classification is a supervised machine learning, that is, before the training sample classification category is known.Typically, text classification process includes pretreatment text information, extract text feature item, text classifier generation, text classification performance test, the classification results evaluation steps. Which extracts text and text classification is an important feature items generated contents of this paper, different feature extraction methods and classification algorithms choose different performance categories will have a greater impact.This paper describes the background and meaning of the text classification study research status. Compared with other countries, domestic text classification research started late, but with the Internet technology has made a breakthrough in the development, text classification technology is increasingly taken seriously at all levels of the country, all kinds of computer research institutions, and developed a series of line Chinese text classification techniques.Then the text classification techniques were related summary description, including pre-text, text feature extraction and classification algorithms items. For deficiencies calculated using the TF*IDF weights feature words, the proposed TF*IDF weight calculation method of improvement. Meanwhile, in order to effectively reduce training time and space text complexity, but try to reduce the impact on the classification accuracy of dimensionality reduction, this paper introduces the concept of correlation characteristics, and use the correlation coefficient to measure the characteristics associated with the degree of inter-feature items. When the feature correlation coefficient greater than the agreed threshold, using the feature item to replace the secondary level to reduce the characteristic feature items synonyms or near-term focus on justice redundancy feature items.Then how to solve the multi-class SVM classification algorithm conducted a study in which binary classification method most widely used, but different binary classification structure will produce different results. When generating a binary tree structure present, is usually based on the distance between the distribution of the samples to determine the class or the sample position in the tree node. This paper proposes an improved method, that when generating multi-class classification considering the binary distribution from the sample classes and categories between better through improved promotion of experimental analysis and comparison of the performance of the algorithm.Finally, the design of a text classification in the stock market industry information on the automatic classification system, feature extraction module used in this article to improve the TF*IDF weight calculation method and dimension reduction methods; using the proposed improvements based on binary tree SVM classification module multi-class classification. The course of this study failed to explore in depth the issues, and pointed out the direction for further research.

Keywords/Search Tags:

text classification, support vector machine, binary tree, term frequency, feature items

PDF Full Text Request

Related items

1	Text Classification Research Based On Support Vector Machine
2	Analysis And Application For Web Text Classification Based On Support Vector Machine
3	Research On Text Classification Algorithm Based On SVM
4	Research And Application Of Remote Sensing Image Classification Based On Partial Binary Tree Twin Support Vector Machine
5	Research On Text Classification Method Based On Support Vector Machine
6	The Study Of Text Classification Based On Support Vector Machine
7	Research On Support Vector Machine Classification Algorithm For Multi-class Texts
8	The Binary Tree Of Multi-Class Support Vector Machine And The Application Of It In Image Semantic Classification
9	Study On Text Classification Based On Multi-class Support Vector Machines
10	Research Of Chinese Text Classification Based On Mixed Feature