Font Size: a A A

Study On Information Extraction And Visualization For Web News Texts

Posted on:2018-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:K FuFull Text:PDF
GTID:2428330512981050Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet technology,human society is producing massive data rapidly.The text is an important form of massive data,the amount of text data is also soaring.Meanwhile,Text mining technology aimed at discovering valuable information and knowledge from massive text data have emerged.Massive web news texts contain a lot of information that is not readily available.For instance,massive financial news contain economic laws,massive policy news contain policy changes and effect.Based on the study about basic theory and key technology of information extraction,complex network,text visualization,the paper analysed the features of web news texts and the information demands of users.Then the paper researched on some suitable information extraction methods and visual forms for web news texts,and verified the availability of the method in this paper and obtained valuable conclusions by using Web news texts about national strategy "Internet plus" as research materials.The main research work of the paper includes:(1)To analyse the features of web news text and the information demands of users.The paper analysed the elements(including Who,When,Where,What,Why,How)of web news text,and inverted pyramid structure which is commonly adopted by news reports,and the information demands of users(such as the industries and companies associated with "Internet plus"),then determined the information entities that need to be extracted,their position and the scheme of weight assignment.(2)To extract the information entities and establish their relationship network.Web news texts information extraction includes the extraction of information entity and the establishment of relationships between them.According to the hierarchy of concepts and the granularity standard of the information entities,the paper processed the information entities by using concept generalization and concept specialization;then established the relationships between information entities,and set different weight according to the relationships,expressed the information entity association network in associated matrix.(3)To propose an Information Entity Rank(IERank)algorithm to rank the information entity according to their importance.After extracted information entities from every web news text and established the relationship network between them,the paper proposed the IERank algorithm,obtained the IER value of every information entity by calculating it iteratively,which represents the importance of information entity,eventually,obtained the quantitative importance rank of all information entities.(4)To build a basic framework for web news texts visualization and provide the suitable visual forms.Based on the information visualization reference model proposed by Card,the paper elaborated the web news texts visualization procedure,used the visualization technology based on vocabulary,semantic relation,subject field and time series comprehensively,adopted network diagram,time axis,geographical map as visualization method,provided the visual forms for web news texts.(5)To extract information and visualize the web news texts about "Internet plus" by using the method in this paper,and obtain valuable conclusions.The paper collected web news texts about "Internet plus",extracted information entities and established the relationships between them.Then the paper calculated the importance ranking of all information entities,ultimately,carried out visualization analysis adopted network diagram,time axis,geographical map,from multiple angles based on correlation network,time series,geographic location,obtained a series of valuable conclusions of industry development,geographical distribution about "Internet plus",etc.All in all,this paper analysed the features of web news texts and the information demands of users,researched on the methods of the extraction of information entities and the establishment of the relationships,proposed IERank algorithm to rank the information entity according to their importance,provided the basic framework for web news texts visualization,at last,verified the availability of the method in this paper and obtained valuable results by using web news texts about "Internet plus" as research materials.The paper provided a whole set of framework and correlative technologies and methods,it has a certain theoretical significance and application value in the field of news text mining.
Keywords/Search Tags:Web News Text, Information Extraction, Text Visualization, IERank Algorithm
PDF Full Text Request
Related items