Font Size: a A A

Research Of Hail Information Extraction Based On Sina Weibo

Posted on:2017-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2348330512477556Subject:Control engineering
Abstract/Summary:PDF Full Text Request
As a strong destructive weather,hail has brought great damage to people,so it is necessary to do the research of hail.There is some hail identification and prediction research.However,the prediction results need actual Hail Events to verify.Tradition method to collect and record the hail event relies on the special meteorological personnel,which has the limitation of time and region.To get the data more convenient and quick we turn our attention from the tradition method to the internet.On the internet Sina Micro-blog the largest online Micro-blog platform and having the most active users of the country.In addition as a rare extreme weather,people tend to publish the relevant information online,so we choose Sina Micro-blog to get what we want.There are some data acquisition methods of Sina Micro-blog.The first method is based on the third software,another is based on the Sina API interface and the other is based on web crawler.As the advanced search interface of Sina Micro-blog is needed,and Sina has no public access to the interface,then the web crawler technology is adopted to get the text containing “hail”.However not all Micro-blog texts containing “hail” hail contain the Hail Events,In all the Micro-blog texts containing “hail”,one part of them describe the occurrence of Hail Events,one part contain weather forecast information,the other contain neither the Hail Events or the weather forecast information.In order to identify the texts that contain “hail” from all the texts,text classification method is applied.So the labeled sample space is established on the three kinds Micro-blog texts.The main step of the text classification is text feature extraction,there are many methods to extract textual feature.In this paper each method is been improved then all the methods are combined to get the best performance.Then the phrase and the parts of speech are also considered as the textual features not just considering the word as textual features in order to get all the useful information.In this paper,we use Bayes,K-Neighbors and support vector machine to make prediction at the same time,then combine the prediction results of the three classifiers,The test results show that hail events extraction rate is 89.5% by the presented method,mistaken identification rate is less than 13.4%.Finally,matching method based on thegrammar rules to extract the time,location and size information of Hail Events from the Micro-blog texts containing Hail Events.
Keywords/Search Tags:Sina Micro-blog, hail information, feature selection, text classification, text elements recognition, web crawler
PDF Full Text Request
Related items