Font Size: a A A

Research On Key Technologies Of Data Organization For Web News Based On Spatio-Temporal Information

Posted on:2017-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:M Y MaFull Text:PDF
GTID:2428330569999053Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,the Internet has a lot of news at all times.The news contains rich spatial and temporal information.The analysis of web news based on spatial and temporal information can provide information gathering and decision support services for different levels of government,enterprises and individuals.The data organization of web news based on spatio-temporal information is the basis for the spatio-temporal analysis of web news.This paper studies the extraction and normalization of spatio-temporal information in web news,and designs a data organization model for web news based on spatio-temporal information.The main contributions of this paper include the following aspects.1)In order to extract and normalize the spatio-temporal information of web news,it is necessary to formulate the spatio-temporal information annotation,which is conducive to the exchange of information and the sharing of resources.Based on the existing methods of spatio-temporal information annotation,and combined with the characteristics of the spatio-temporal information descriptions in web news,this paper designs a spatio-temporal information annotation system for web news with XML as standard.2)This paper presents a rule-based web news spatio-temporal information extraction and normalization method.First,a spatio-temporal knowledge base is constructed according to the characteristics of the spatio-temporal information description in web news.The spatio-temporal knowledge base stores the mapping relationship between temporal and spatial information to the specific time and spatial location.In this paper,we propose a down-cover spatio-temporal information extraction and normalization method.This method is applied to Spark clustering to extract the temporal and spatial information of large-scale web news.3)This paper presents a spatial location information representation method based on Geohash.Geohash is a kind of coding method that uses one-dimensional code to express two-dimensional spatial coordinates.In this paper,an adaptive spatial data Geohash coding method is proposed.The spatial location of each spatial object is represented by 1 to 4 Geohash codes.Based on Geohash,this paper presents a data organization model for web news based on spatio-temporal information.Experiments show that the model has good performance in spatial information retrieval.4)This paper designs a prototype system for spatio-temporal analysis of web news.The system can be used for news retrieval and news recommendation based on time and space,analysis of hot keywords in different time and space ranges,and analysis of the trend for events.
Keywords/Search Tags:Web news, Spatio-temporal information, Information annotation, Information extraction, Geohash, Spatio-temporal data model
PDF Full Text Request
Related items