Text Classification Based On Natural Language Processing, Analysis And Research

Posted on:2012-10-07

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Zhang

Full Text:PDF

GTID:2208330335984665

Subject:Computer application technology

Abstract/Summary:

Nowadays, with the information technology development, machine learning and pattern recognition in computer science are more and more mature and widely applied to many areas, one of the important research direction is based on the statistics of natural language processing. Due to the rise of the Internet, the electronic text information based on natural language description is exploding, the information processing based on natural language is one of the biggest target how effective these information acquisition and management. These questions raised a lot of research and applications on natural language, of which text categorization as the basis for information retrieval problems, is especially paid attention.Text categorization mainly divided into two stages, using natural language processing, machine learning, pattern recognition, text mining technology to realize. Therefore, the value of text classification theory research reflected in these technologies. Text classification can effectively improve the effect of online information retrieval, not only to improve the information of acquisition modes, but also an important aspect of content security. Therefore classified the performance has become the focus of attention, the research of text classification task and engineering application, will be having the important meaning.In the existing research results, text categorization and related technologies have been done some research. The beginning of the thesis introduces the status of the text classification and the research significance; Then it introduces text classification process and the related technologies in the process, it also has researched the Chinese word segmentation method, feature selection method, text classification algorithm ; the thesis introduces the design of the text categorization, the process in order to eliminate ambiguity for three characters long ambiguous phrases of overlap type and process stop words, the best match points of lexical was improved, meanwhile based on KL dispersion degree feature selection method and combining the characteristics of TFIDF weights, such a feature selection can compare to accurately express the text, lay a good foundation for classification, Finally to the Bayes algorithm, simple vector distance classification and KNN (K nearest neighbors) algorithm, the thesis had found out the classification results compared with time complexity and selected a better practical algorithm.

Keywords/Search Tags:

Nature Language Processing, Text Classification, Data Mining, Feature Selection and Extraction

Related items

1	Research On Chinese Text Classification Algorithm Based On Active Learning Approach
2	Text Classification Method Based On The Longest Closed Frequent Sequential Patterns
3	Applications Of Data Mining Techniques To Text Classification And Bioinformatics
4	A Study Of Text Classification Algorithms Based On Feature Selection
5	Research On Text Classification In Data Mining
6	Studies On Key Techniques Of Text Classification And Mining For Specific Domains
7	Analyzing A Deep Feature Extraction Method For App Descriptions
8	Research On Text Classification Of Web Text Mining
9	Research On Core Technology Of The Chinese Text Classification
10	Research On Feature Selection And Multi-label Transformation Of Text Classification