Study Of Component Analysis Algorithm And The Application In The Feature Extraction From Web Text

Posted on:2006-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Zhang

Full Text:PDF

GTID:2168360152466598

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Nowadays, World Wide Web (WWW) is developing fast at the depth and width, the capacity of information is increasing at the exponential speed. Usually, Web text data are expressed in HTML format, so we transform the text into a feature vector which can reflect text content . They commonly have the shortage that the text feature vectors have egregious dimensions. It leads to the feature extraction being essential to Web text mining. The paper is base on the processing of Web text information, then makes deep study of the feature extraction method from theory and application respectively.The paper begins with the Web text mining model, describing the definition,characteristics,processing and universal technology. Secondly, it discusses in detail about participle,feature expression,feature extraction. Finally, the paper analyzes at length and ameliorates the text feature extraction which is the core in the process of Web text mining, puting forward the algorithm which is composed of the SVD and gene analysis. Then improving the validity of the algorithm through experiment and bringing forward the genetic algorithm based on vector similarity and gene analysis.The paper aims primarily to study and realize the feature extraction algorithm. The acquirement of feature vector is a NP problem. At the present, many scholars are paying attention to the study for feature extraction, several new methods have come into being. Many methods endue the word with definite power based on the word frequency and the position and select the bigger.The paper puts forward two kinds of methods based on the component analysis algorithm: â‘ feature extraction algorithm based on principal component analysis: it make use of the combining of SVD and principal component analysis to find the potential notional structure. It expresses the original feature with the combination of the principal component so that it can embody the internal relation to explain the texts. â‘¡genetic algorithm based on vector similarity: we transform the acquirement of text feature vector into searching excellence in the Web text space. The better individual will reflect the text preferably and include the information of other chromosome. It is said that it will have greater similarity with other individuals. The paper transforms the individual which is composed of distinct feature words into the vector of the space which is composed of common component of component analysis algorithm. Then constantly searching the question territory space based on vector similarity to obtain the best feature vector.At last, it introduces the design and realization of the system and presents with the experimental result of the two feature extraction algorithms.

Keywords/Search Tags:

Web Text mining, Feature extraction, component analysis, singular value decomposition, Genetic algorithm

PDF Full Text Request

Related items

1	Research On Key Technology Of Signal Processing Based On Singular Value Decomposition
2	Phase Extraction Method Of Projection Fringe Images Based On Singular Value Decomposition And Neural Network
3	The Research And Simulation On The Key Techniques Of Text Mining
4	Food Matching Recommendation Based On Component Feature Extraction And Latent Semantic Analysis
5	Research On Multi-modal Feature Extraction Based On Subclass Discriminant Analysis And Generalized Singular Value Decomposition
6	Characteristic Extraction Research Of Rotating Machinery Vibration Signal
7	Reseach On Image Quality Assessment Based On Visual Perception And Local Feature Extraction
8	Research On Feature Extraction Algorithms In Face Recognition
9	Research On Text Feature Extraction Based On A Method Named CM-RS
10	EEG Feature Extraction And Online BCI Research