Font Size: a A A

The Research And Application Of Text Association Rule Mining Method

Posted on:2011-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:X G WanFull Text:PDF
GTID:2178360305967457Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the widely popularize of the Internet and the improvement of the enterprise informatization, text information quickly accumulates. A convenience and effective tool which can extract the concise and comprehensible knowledge is an urgent need. Text mining is a research area which can solve this problem. Because of that text data which has a big difference with database data, is semi-structured or non-structured data, there are similarities between text association rule mining and database association rule mining on the aim and differences on the implementation technique.In this paper, the key techniques and methods are researched and the process of the text association rule mining is described on the background of mining user requirements of machine industry. The following works have been done:1. By comparing a variety of Chinese word segmentation tools, requirements are split into part-of speech tagged words by ICTCLAS. The function words in Chinese can be filtered by part-of speech tagging to reduce dimension roughly. The candidate features of requirements can be gained by Chinese word segmentation and stop word removal.2. Candidate features of every document are sorted by frequency beginning with the largest. The high-frequency words are selected as document features when cumulative frequency reaches a certain threshold. In order to avoid removing low-frequency specialized words, the specialized words appeared in the document are selected by term library. The document feature-set is generated by collecting features of all documents. Non-specialized features with high document frequency are removed by feature dimension reduction using DF methods.3. Text feature vector space model of document set is generated by transforming semi-structured or non-structured text data to structured vector by vector space model (VSM).4. Using text feature vector space model as input, the correlation of specialized words and non-specialized words is calculated by grey correlation formula.5. Based on the result of calculating correlation, the second feature reduction can be carried on by professional features array, which is also used to describe document set from a certain amount of feature words.Finally, experiment has been carried out on 19 excavator requirements. The application of feature words gained by data mining in mechanism design information processing system is described in detail.
Keywords/Search Tags:Chinese word segmentation, feature extraction, feature dimension reduction, VSM, grey correlation degree
PDF Full Text Request
Related items