Font Size: a A A

Document Topic Mining And Document Static Visualization Based On LDA

Posted on:2014-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q K ShiFull Text:PDF
GTID:2268330401985888Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of information technology, the information is everywhere in our lives. While enjoying the convenicency by many kinds of information services, everyone has to face that the information is too much and can not be handle. Text as the main information carrier, show the most serious information overload. So, how to mine the topic from a text corpus, has become the research focus in text mining.In this paper, it has been researched that mining the document topic and document static visualization based on LDA.The specific work is as follow:(1) Firstly, it is true that words, which are continuous with same topic, have continuous semantic. And with this truth, a method that extracts the topic of a document, has been proposed. This method can improve the shortage of using single word to express topic. Experiments show that the method has better accurary and readability.(2) Secondly, job information has two feature:a) short document; b) the topic of a sentence is clear. According to the feature, based on (1), a method that use LDA to mine the technology topic from job information, has been proposed. It use LDA to model the sentences of job information. And then with SVM it classify the sentences about technology. Lastly, it extracts the technology topics from the technology sentences and generate the technology topics of a document from job information. Experiments show that the method achieves good result in extracting the technology topics of a document from job information.(3) Finally, a method that produces the topic static visualization focus on a document from a text corpus, has been proposed. According to the weight table of topic words by LDA and tf-idf, it calculated the weight of topics. And then it determined the layout style for topics. Lastly, using Processing, it creates the topic static visualization for a single document. And it generated the topic visualization of a job information document. Experiments show that the method achieves good performance in display the topic of single document.
Keywords/Search Tags:text mining, LDA, topic, job information, text visualization
PDF Full Text Request
Related items