Font Size: a A A

Research And Realization Of Topic Extraction Based On Text Mining

Posted on:2011-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2178360308457335Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the coming of information period, the Internet develops fastly. It is common of the digital library and digital office. The network information develop quickly. People have few time to read so many materials. So obtaining the knowledge that people need quickly is demanded urgently. Currently, the exiting topic extracting system, can fulfil people requirement partly, but they have lower efficiency and precision. The title analyses the defect of the existing system, and improves them , advancing the new technology of topic extracting based on text.The topic extracting based on text is on the basis of text mining, chosing one document of a field and hundreds of documents of other field firstly. Then, representing the documents and splitting the sentence. At last, calculating the word frequency, and calculating the weight. It delegates the importance of the word in the field. Usually, the more important the word is ,the more appearance in the field. So we can get a list according to the weight, and put the words into the topic lexicon by scale. At the same time, many documents make up of the title, abstract and key words. They are more important on delegating the topic of the document against the text. So, we have to solve them by caculating the text similarity in order to extract the topic exactly. Compound word connection is also an important step in the topic extraction. Because the word father has the same meaning of "ba ba". When adding the word to the topic lexicon, we should inquire the Synonym table to add the Synonym. In the topic lexicon, we add to parameters to enhance the efficiency and precision.On the help of the topic lexicon, we can represent the document that extracting the topic, and splitting the word by the topic lexicon. Then, calculating the word frequency, and calculating the weight. At last.we can acquire the topic of the document, such as "sport-->football-->free kick".It is shown by the experiment, the precision is 80% above, and it will enhance along with the number of background documents. The reason is, it will be more important along with the increment of words.In the experiment of the topic extraction, as the help of the topic lexicon, the method can extract the topic of the document extactly and quickly. Thus, it can help people to enhace the working efficiency.All in all, by improving the traditonal methods of topic extraction, the techonlogy of topic extraction based on text advances a new method of topic extraction, and enhance the efficiency and precision of topic extraction. More important,it can improve themselves continuously.
Keywords/Search Tags:Data mining, Text mining, Text classifying, Topic lexicon, Topic extraction
PDF Full Text Request
Related items