Research On Feature Selection Methods And Its Applications In Text Clustering

Posted on:2016-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Yu

Full Text:PDF

GTID:2308330461483503

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Text data mining has become an important area of research. Its research object is text data from various data sources. It can help people mine, analyze text content and discovery text mode. Text clustering is a vital task in the field of text mining. It can help enterprises or users to summarize text data. High dimensional sparse text feature reduces the performance of text clustering. An effective method of feature selection is a key link to improve the text clustering effect. This paper mainly studies feature selection method in text clustering and applies to telecom customer complaints data. Concrete research content is as follows.This paper put forward to a kind of FS-CR feature selection method based on text clustering results. This method firstly clusters the original text corpus and will obtain a text of initial clustering results. Then according to the initial clustering results as category label, we calculate the text information gain of all features and choose the important features. Finally, we cluster text corpus again using important features and a better clustering results are obtained. This article will compare FS-CR method with the existing feature selection method such as document frequency and term contribution through three experiments. We use F-measure and feature compression ratio to evaluate results. Results show that the method uses a small amount of effective features to obtain the higher F-measure values and FS-CR feature selection method is feasible.Traditional weight calculation method only considers feature frequency and document frequency. There is a large number of semantic information in text. This paper introduced the location factor and paragraphs co-occurrence factor. A new feature selection method called FS-SI-CR which is based on text semantic information and cluster result is proposed in this paper. By introducing semantic information, text theme weight was strengthened, so as to optimize the initial text clustering results, and then improve the effect of the final text clustering. This paper compared FS-SI-CR with FS-CR and term contribution with semantic information. Experimental results show that the FS-SI-CR method is superior to other feature selection methods both in overall clustering effect and in text category.Existing telecom customer complaint data is text data with no category information. The text is different, for short text telecom customer complaint, paragraphs co-occurrence sematic information will be transferred to sentences co-occurrence sematic information. Firstly, this paper puts forward the telecommunications industry customer complaints framework of text mining. Then we conduct text preprocessing and FS-SI-CR method in telecom customer complaints in the text. Through the clustering results, we found that the application of FS-SI-CR method is good and the method can choose a few effective features. According to the analysis of features of different categories, we can discovery customer complaints issues so as to improve customer complaint handling efficiency and reduce labor costs. Importantly, it will provide decision support for telecom enterprise managers.

Keywords/Search Tags:

Data Mining, Text Clustering, Feature Selection, Customer Complaints

PDF Full Text Request

Related items

1	Research On Text Preprocessing And Summarization Technology For Customer Complaints In Telecommunication Industry
2	Key Techniques Of Text Ming On Criminal Cases
3	Data Mining And Feature Selection Of High Dimensional Biomedical Data Based On TCGA And Pubmed Databases
4	Research On The Application Of Feature Screening And Clustering Algorithm In Text Mining
5	Research On Key Problems In Text Mining Based On Cloud Method
6	Research Of Chinese Web Text Clustering Technology
7	Customer Value Evaluation System Of Mining And Customer Value Clustering
8	Clustering Mining In Telecom Customer Classification Of Research And Application
9	Research On The Model Of Communication Customer Churn Based On Data Mining
10	Research And Implementation Of Key Technologies On Web Text Mining