Font Size: a A A

The Research & Realization On The Key Techniques Of Text Mining

Posted on:2008-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:G J XuFull Text:PDF
GTID:2178360242460584Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and gradual maturation of the internet techniques, the information on internet goes up explosively, while at the same time, the problem appears that the information is abundant but the knowledge is scarce. Because the Internet is open and isomerous, it is difficult for users to find the valuable information from internet quickly and exactly. Therefore, text mining technique is an important subject for research which is paid more attention by the researchers in recent years.Text mining is a cross of two subjects: data mining & information searching. It can summarize, classify, cluster, analyze the relevancy, and forecast the trend for the contents of the documents. It is required to preprocess the text first, get the figure of the document character, and take out the document information, etc. But the Chinese text's form is different from the west languages, which brings much more difficulties to the research about the technique. Although some progress has been made in the present research on the text preprocess, there is still a lot of work to do for the text information is not drawn very precisely.The text mining is researched mainly in this dissertation and much work has been done as follows:1. It begins with the technology of Chinese text segment, introduces several methods of Chinese word grouping, and designs a system for it.2. It analyzes the technology of getting the text eigenvalue, and compares their capability. The Information Gain, Mutual Information and the word Frequency Statistics are paid more attention on. Also it puts forward an improved method of getting the document characters.3. It analyzes the technology of the text classification, uses the improved weight of power & KNN to realize text classification and validates the ratio of entirety or precision by experiments.
Keywords/Search Tags:Text Mining, Chinese Text Segment, Ambiguities Elimination, Character Drawing, Text Classification
PDF Full Text Request
Related items