Research On Algorithms For Text Feature Selection

Posted on:2011-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:N Lin

Full Text:PDF

GTID:2178330332956551

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is a fusion of database technology, artificial intelligence techniques, machine learning and many other interdisciplinary research areas. Data mining is from a large number of, incomplete, and there is noise, and vague, the practical application of random data, extracting implicit in the work, that people do not know in advance, but is potentially useful information and knowledge. Text classification is an important data mining research content and text classification feature selection is the key technology and core issues.In the text classification process, the text of the face of high-dimensional feature space, there are a large number of irrelevant features and redundant features, so many scholars tried to use various methods to remove the text feature space irrelevant features and redundant features in order get an approximation of the superior feature set. At present, the widely used feature selection algorithm to remove the text are mostly confined to irrelevant features, redundant features of the text to consider is relatively small, leading to the text although they can significantly lower the dimension of space, but the classification results not accurate enough.In this paper, feature selection for text classification algorithm of some classical system of analysis and summary, and then put forward on this basis a new problem to solve the corresponding feature selection algorithms:First, a text based on dynamic programming feature selection algorithm Thought (DPFS), the algorithm is not from the characteristics of relevance and redundancy of the overall consideration, combined with dynamic programming, through the feature subset do not related and Redundancy Analysis, and then get a near optimal feature set. Experimental results show that the algorithm through a combination of dynamic programming, the calculated C-related and R-related to storage, to avoid a lot of double counting, to improve the performance of programs. In addition, the algorithm also did not feature relevance and redundancy analysis, so improving the accuracy of feature selection, which also greatly improve the accuracy of text classification.Secondly, the paper proposes an improved version of the LAM feature selection algorithm (ILAMFS), the algorithm is not from the characteristics of relevance and redundancy of the overall considerations, the right feature set to a non-correlation and redundancy analysis been done on the basis of a secondary redundant, so I got an excellent feature set is more similar. In addition, this algorithm uses a linear calculation, in a way that greatly improved the speed of calculation, while the threshold is difficult to solve a given problem, the algorithm also incorporates the golden section, the weighted average, etc..Experimental results show that ILAMFS data dimensionality reduction algorithm, the threshold selection and dimensionality reduction process to reduce the calculation amount and improve the accuracy of feature selection is effective.

Keywords/Search Tags:

Feature Selection, Redundancy, Weighted Average, Dynamic Programming

PDF Full Text Request

Related items

1	Research On Gene Selection Based On Max-Relevance And Min-Redundancy Feature Selection Algorithm
2	Research And Application Of Feature Selection Algorithm Based On Dynamic Weights Using Redundancy
3	Research On Dynamic Feature Selection Algorithm Based On Mutual Information
4	Research On Intrusion Detection Scheme Based On Semi-supervised And Feature Selection
5	Research And Application Of Max-Correlation And Mix-Redundancy Unsupervised Feature Selection
6	Research On Measures And Models In Feature Selection
7	Research On Feature Selection Algorithm Based On Maximum Weight And Minimum Redundancy
8	The Research Of Feature Selection Algorithms Based On Analysis Of Relevancy And Redundancy
9	Relief-based Feature Selection Algorithms
10	Research On Feature Selection Algorithm Based On Similarity