Font Size: a A A

The Key Techniques Research On Text Mining

Posted on:2006-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:1118360155460408Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of Internet, electronic information greatly increases. It become a hotspot for information science and technology that how to collect and find the interested information of user, and discovery latent, useful knowledge quickly, exactly and fully. Data mining technology is a new research fields to solve the problem. Since 90's the concept of DM was produced, the researches on DM have been very deep, and involved association analysis, categorization analysis, cluster analysis, trend analysis and so on. Structural data such as relational database is main research object for DM, but a majority of information exists with the form of unstructured data in realization; some datum show the unstructured data take 80% of existing information sources, so mining the unstructured information succeeds DM as a new challenge.Text data is a kind of information form used most spread among common unstructured data such as text, image, and video and so on. It is often used in digital library, product catalog, news group, medicine report, organization or individual homepages, and is also applied broadly to natural language understand, text summarize, information extract, information filter, information retrieval etc fields. So its value of business is higher than DM.Research on the key techniques of text mining is done in the paper, including text feature extract and feature select, text association analysis, text association classification. Several methods and techniques are presented from aspects of improving the speed, precision and stability. Our primary works are as follow.(1) The paper present feature evaluating function based document frequency with minimum term frequency threshold to reduce the proportion of noise features and improving the quality of text categorization.At present, the feature evaluating functions are main methods to select text feature for text categorization. These evaluating functions are different because some of them use term frequency and others use document frequency. Feature evaluating function based document frequency with minimum term frequency threshold is present in the paper. The result of experiment shows mutual information, information increase or x~2 Statistic with minimum term frequency thresholds is more effective than with document frequency.(2) Research on mining the top N most frequent item sets in text collection.The frequent item set mining is important step in text association analysis, but it is very difficult to ensure fit minimum support threshold. The paper present a strategy...
Keywords/Search Tags:Text Mining, Feature Selection, Text Association Analysis, Text Association Categorization, Rule Intensity, Boosting Technique
PDF Full Text Request
Related items