Research On Algorithms For Machine Learning And Text Mining

Posted on:2003-09-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q He

Full Text:PDF

GTID:1118360185995730

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In this paper, some algorithms for machine learning and text mining have been researched. It is difficult to classifying massive data by using Support Vector Machine or SVM. To solve the problem, a new universal classification method based on hypersurface is put forward to classify data in the first part. The theoritic base of the method is Jordan Curve Theorem. The contributions in the first part are as follows:1) The existence of separating hypersurface and the geometric construction of separating hypersurface is studied. Moreover, the Classsification method based on Geometric HyperSurface, abbreviated by GHSC is put forward. The characteristics of GHSC mainly list as following: i) It can directly solve the nonlinear classifying problem. It need not consider kernel function and need not make mapping from lower dimension space to higher dimension space either. ii) It is a universal and operable method to make separating hypersurface. iii) It is an interesting, convenient and manageable classifying method. It classifies data according to whether the wind number of the sample is odd or even. Therefore, it is convenient and manageable to classify data using non-convex hypersurface. iv) It is suitable to classify massive and is expected to deal with high dimension data problems.2) Using GHSC method, the programs for classifying data are designed for 2-dimension and 3-dimension space. The experimental results of typical nonlinear data discrimination show that the separating hypersurface method can solve the problem of classification of a vast amount of data (10~7) effectively. Moreover, GHSC can classify data that is distributed in very complex regions. It is clear that the classifying efficiency and accuracy have been improved by using the method.3) We explore the generalization of the GHSC method to efficiently resolve the classifying problems of multi-class.4) For high dimension data, we accepted algebra hypersurface to classify. An adaptive algorithems for the order of algebra hypersurface is put forward to avoid complex computting.In the second part of the paper, for satisfying the need of large scale text mining, some text mining technology such as text information extraction, text clustering, multi-text summarrizing, the concept and semantic space, semantic index and retravial have been studied. The more concrete content is as following:1) A HMM Model for concrete BibTex entries is built, and this model is extended to open data set. Then we optimize the model through introducing smoothing technologies and extracting rules to improve the accuracy of information extraction. The experiments show that both smoothing technologies and extracting rules are effective optimization methods, and they improve the accuracy of information extraction.2) We select the SOM (self-organizing maps) and fuzzy clustering for the...

Keywords/Search Tags:

Machine Learning, the classification method based on hypersurface, text clusteing, Hidden Markov Model, Information Extraction, Self-Organizing Maps (SOM), multi-abstract, concept semantic space, fuzzy direct cluster, semantic index

PDF Full Text Request

Related items

1	Text Classification Based On Hidden Markov Model And Semantic Fusion
2	The Model Of Text Concept Semantic Space And Its Applications
3	Research On Semantic Web Fuzzy Ontology Construction Based On FFCA
4	Algorithm Research For Text Information Extraction Based On Hidden Markov Model
5	Research On Text Clustering Based On Latent Semantic Analysis And Self-organizing Maps
6	Web Text Information Extraction And Classification
7	Case Studies For Semantic Aware Statistical Machine Learning Applications In Code Security Problems
8	Research On Hierarchical Classification Methods For Chinese Texts And The Related Application
9	Study On Semantic Informatioin Extraction From Web Page
10	Research Of Text Clustering Based On Self-Organizing Maps