The Research And Application Of Text Association Rule Mining Method

Posted on:2011-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:X G Wan

Full Text:PDF

GTID:2178360305967457

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the widely popularize of the Internet and the improvement of the enterprise informatization, text information quickly accumulates. A convenience and effective tool which can extract the concise and comprehensible knowledge is an urgent need. Text mining is a research area which can solve this problem. Because of that text data which has a big difference with database data, is semi-structured or non-structured data, there are similarities between text association rule mining and database association rule mining on the aim and differences on the implementation technique.In this paper, the key techniques and methods are researched and the process of the text association rule mining is described on the background of mining user requirements of machine industry. The following works have been done:1. By comparing a variety of Chinese word segmentation tools, requirements are split into part-of speech tagged words by ICTCLAS. The function words in Chinese can be filtered by part-of speech tagging to reduce dimension roughly. The candidate features of requirements can be gained by Chinese word segmentation and stop word removal.2. Candidate features of every document are sorted by frequency beginning with the largest. The high-frequency words are selected as document features when cumulative frequency reaches a certain threshold. In order to avoid removing low-frequency specialized words, the specialized words appeared in the document are selected by term library. The document feature-set is generated by collecting features of all documents. Non-specialized features with high document frequency are removed by feature dimension reduction using DF methods.3. Text feature vector space model of document set is generated by transforming semi-structured or non-structured text data to structured vector by vector space model (VSM).4. Using text feature vector space model as input, the correlation of specialized words and non-specialized words is calculated by grey correlation formula.5. Based on the result of calculating correlation, the second feature reduction can be carried on by professional features array, which is also used to describe document set from a certain amount of feature words.Finally, experiment has been carried out on 19 excavator requirements. The application of feature words gained by data mining in mechanism design information processing system is described in detail.

Keywords/Search Tags:

Chinese word segmentation, feature extraction, feature dimension reduction, VSM, grey correlation degree

PDF Full Text Request

Related items

1	A Research On Spatial Cognition And Representation Of Environment Based On Grey Qualitative Method
2	Chinese Keyword Extraction Method Based On Word Span And Its Application In Text Classification
3	Research On Multi-feature Fusion Target Tracking Algorithm Based On Correlation Filtering
4	Feature Dimension Reduction For Neonatal Pain Facial Expression Recognition
5	Research On Chinese Word Segmentation Method Based On Statistical Learning
6	Unsupervised Clustering Algorithm Based On Dimension Reduction
7	Research And Application Of Internet Chinese Text Classification
8	An Unsupervised Feature Selection Method Based On The Degree Of Feature Cooperation
9	Research On Feature Learning Algorithm For High-dimensional Data
10	Study On Feature Extraction Based On Maximizing The Distance Between Classes