Font Size: a A A

Research On Construction Of Special Corpus Of Air Pollution And Spatialization Of Corpus

Posted on:2021-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:P F SongFull Text:PDF
GTID:2491306032966219Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the increase of national and public awareness of environmental protection and the implementation of strong environmental protection measures,air quality has also improved,but the public requirements for a better living environment have also increased.The current air quality situation is increasingly unable to meet the needs of the public,and some local air pollution incidents still occur frequently.At present,China air quality monitoring methods mainly rely on 1,436 state-controlled monitoring sites and 96 regional air quality monitoring sites,most of which are distributed in cities and have sparse locations,so,the monitoring of frequent local or regional air pollution events is weak.By collecting information about air pollution incidents in public opinion information,it can effectively make up for the missed inspections and missed detections of local air pollution incidents caused by the sparse fixed monitoring stations,but the data sources on the Internet are diverse and include different areas Information.For this reason,this paper conducted a study on the construction of special corpus of air pollution and the extraction method of special corpus based on air pollution Weibo public opinion data and public complaint data.Aiming at the situation that the Internet corpus has a wide variety of information and messy information,we proposed a method for extracting air pollution thematic corpus based on the combination of incremental target prediction and emotional tendency analysis.According to the 2013-2019 air quality primary pollutant data released by the Ministry of Ecology and Environment,a comprehensive target sentence was selected,and all data were similar to this sentence.Word2Vec was used for word vector training based on the trained words.The vector model compared each piece of public opinion information with the initial target sentence and calculates the angle cosine value.The angle cosine value was used to represent the similarity value,analyzed and compared the similarity value based on experience to determine the threshold for the first time to extract the corpus of public opinion information.For the corpus whose threshold cannot be divided in the first extraction result,by studying the conceptual relationship of air pollution and building a conceptual system of air pollution to analyze the types of pollution,which was used as the basis for the incremental target sentence selection for the second corpus extraction,At the same time,we used the dictionary-based sentiment analysis method to perform sentiment calculation on the extracted corpus,and filtered out some of the corpus that are related to the atmosphere but not related to atmospheric pollution events.Finally,through experiments,a corpus extraction method based on the combination of incremental target sentence similarity comparison and sentiment analysis effectively extracted more than 400,000 air pollution thematic corpora from more than 5 million social platform public opinion information.At the same time,for the extracted corpus,the paper proposed a spatialization method for air quality public opinion information based on natural language processing.Using Chinese word segmentation,Part-of-speech tagging and other methods,the paper extracted addresses from public air pollution complaints data.Through an effective combination of those addresses,the paper realized address matching of those complaint points,and spatialized those key complaint areas in Shandong Province in the form of heat map.Through comparizing and analyzing with the air quality monitoring data of national control stations,it showed that the key areas of public complaints were highly consistent with the key pollution areas which were monitored by national control stations.The research result showed that the public could perceive the air quality directly,and reflect the local air pollution at a smaller space-time scale effectively,which was a robust supplementation to the monitoring data of the national control station.
Keywords/Search Tags:Corpus construction, Text similarity calculation, Sentiment analysis, Natural language processing, Address matching, Spatialization
PDF Full Text Request
Related items