Font Size: a A A

Research On Extraction Of Professional Individual Microblog Events

Posted on:2016-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XiongFull Text:PDF
GTID:2298330452471394Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the widespread development of computer application technology in recent years,the speed and quantity of network information transmission has grown explosively. Microblog, as a new media different from such traditional medias as television, radio and so on,has originated many information through the high popular internet. Compared to traditionaltexts, Micro blog is featured with all but not limited to short text, quickly changed topics aswell as cyber language, so it is more convenient for people to express their views andfeelings in real time, but it also produce a large amount of redundant information in themeantime. What involved in this article is professional individual micro blog, and it mainlydiscuss some topics related to bloggers with professional knowledge. Basically, thesetopics are limited in a professional field, but sometimes they also include public topics.Because simple contents in Micro blog and widely used mobile devices, people cantweet via their mobile devices whenever and wherever possible. Therefore, a huge amountof data can be produced in a quite short time, and people have to face the quickly increasednetwork information. It is not only a heavy workload, but it also difficult for people toselect their concerns quickly and accurately, if these giant and ruleless micro-bloginformation was handled artificially. Repeated experiments have demonstrated that it is notideal to apply the traditional algorithm into the extraction of professional individual microblogs. In this way, the research trend of personal micro blog information detectiontechnology is to find people’s concerns quickly from how a mass of messy micro bloginformation.In order to identify the certain interest of professional bloggers automatically, aalgorithm based on LDA to extract professional individual micro-blog events has beenraised. This algorithm would filter out the data on micro blog, remove those texts andexpressions without value as well as those irrelevant links. It would also segment words by using Institute of Computing Technology, Chinese Lexical Analysis System, which is shortfor ICTCLAS to mark the part-of-speech of key words and disable stop words.Next, it would take advantage of CHI, feature evaluation function, to measure theimportance of every key word to each category. After the uniform distribution of keywords in the corresponding category, it would use the improved TF-IDF algorithm to pickup key words. Furthermore, it would apply LDA to model corpus and then dig out thevarious relationship between certain themes and relevant words. In this case, the words ofmore weight can reflect the blog theme more distinctly, which will not only be helpful tofind out the probability of different micro blogs under the identical theme, but also tocalculate the comprehensive proximity of diverse blogs in the light of time similarity.Ultimately, a comparison between the final data and the artificial data could be made bygathering all micro blogs of the same topic into one collection through the updated K-Means clustering.The experiment has verified the effectiveness of this algorithm and it also shows thatthis algorithm could present those blog events of people’s interests structurally andlogically.
Keywords/Search Tags:Professional individual micro-blog, LDA, Similarity, Event extraction
PDF Full Text Request
Related items