Font Size: a A A

Web Text KNN Classification Based On Rough Set And Its Application In Finance

Posted on:2014-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2268330398984428Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the globalization of financial markets and the wide application of computer network technology, the global financial market has begun to financial network.at the same time, the Internet has already become enterprises, institutions and individuals access to the main source of financial information; As a professional or personal financial information service providers, in the face of such a vast and complex Internet financial information resources, We inevitably will face a huge challenge, namely how to live from the Internet, fast classification and handling of financial data and How to improve the Web financial data retrieval efficiency and quality, thus improving the company’s financial information service quality, enhance the core competitiveness of a company in industry of the financial information services, it also become a key problem in today’s academic research.With the development of information technology and communication technology, information technology of automatic classification has become the effective tool of financial information classification. Today, Web text categorization of Chinese information processing is an important area of research. Its target is to on the basis of analyze the text content, assigning a text to a more suitable category, in order to improve the processing efficiency of text retrieval application. There are many methods on the technology.Today, K nearest neighbor (KNN) algorithm is considered to be the one of the best classification algorithm under vector space model. KNN algorithm is a commonly used in the field of automatic text classification algorithms, for text categorization of low dimension, the classification accuracy is higher. But when dealing with a large number of high dimension text, the traditional KNN algorithm causes computation increase of samples similarity by the need to handle a large number of training samples, low the efficiency of classification. To solve the problems, people use the rough set to deal with attributes reduction for high-dimensional text information, delete redundant attributes, people Have put forward some mixture base on rough set and KNN classification method, mainly to do research in terms of attribute reduction. Although efficiency compare to the traditional single KNN algorithm on classification efficiency has greatly improved, but still has great room for improvement. Based on rough set and KNN algorithm, this paper will give a new KNN classification model based on rough set, in the model, we will introduce the modified clear matrix reduction method for reduction of attributes. And adopt a modified mode of CHI and polymerization method to process the feature extraction stage, and thus the number of feature vector is greatly reduced. Thus reducing stage classification of data input, thus improve the classification efficiency of the whole of the classification system, and reduces the time and space complexity, of classification system. In this article, we also through the comparison of experiments, to prove that the new classification method of the KNN classification algorithm based on rough set than the existing general KNN algorithm based on rough set on classification efficiency has greatly improved. And by comparing the experiment to prove our conclusion.The paper analyse current situation and problems of Web mining and Web text mining research,, mainly studies the key technology of Web text categorization, text classification methods, and the text classification methods based on rough set and hybrid Web KNN, the main research work includes the content as follow:(1) To introduce the Web mining and Web text mining and Web text categorization, the basis theory of rough sets and related knowledge, introduce the key technology in the process of Web text categorization.(2) To give the Web text categorization system model of combination of rough sets and KNN.(3) To give a improved clear matrix reduction algorithm based on rough set.(4) To give a KNN algorithm based on CHI feature extraction and pattern aggregation method.(5) To give the Web text categorization system based financial environment, and its experimental results and comparative analysis.
Keywords/Search Tags:Web text categorization, clear matrix, the rough set, finance, KNN algorithm
PDF Full Text Request
Related items