Font Size: a A A

Street View Image Classification Base On Attention Meshanism

Posted on:2023-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:T YuFull Text:PDF
GTID:2568306833982139Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Unlike satellite or aerial remote sensing images,street view images can provide visual information with rich ground detail,and it is gradually becoming one of the most important sources of near-Earth remote sensing data due to its wide distribution,ease of access,accurate geolocation,and regular updates.Many free street view image service providers,such as Google Street View(GSV),Bing Street Side,and others,which span over 200 nations,play an essential role in urban functional zoning,land use evaluation,and human social activity observation.Researchers have labeled their land use qualities using information such as geographic location connected to street view images.As a result,the challenge of urban land use analysis is recast as an image classification problem.Convolutional Neural Networks(CNNs)are still the most used approach for classifying street view images.When dealing with streetscape category labels with strong abstract semantics,however,it is difficult to increase classification accuracy further(eg,“commercial”).There are two main reasons for this: Firstly,the convolutional kernel sensory field’s limited nature makes it impossible to collect global information,and its presentation abilities are poor for scene-like images with rich contextual information;Secondly,while CNN models can extract global information due to their deeper layers,convolutional kernels that learn statistical feature patterns from a large number of training samples lack the ability to adaptively focus on key features,making it difficult to find the most resolved feature localization when faced with high abstraction semantics.For these problems,this paper improves the performance of the street image classification task by introducing different attention mechanisms,formulation of inductive biases such as more dispersed visually salient regions and semantic degradability of high abstraction labels specific to street view images as a concrete attention model.The main research of this article is as follows:(1)Creating BEAUTY(Building d Etction And Urban func Tional zone portra Ying),a dataset of double-annotated street view image.The BEAUTY dataset was generated in this paper using data cleansing and expert views,based on an existing street view image collection.The dataset was annotated with both street view images and building objects at both levels: each image had an urban land classification label(residential,commercial,industrial and public);Meanwhile,a bounding box was used to designate the position and category of each building object in each image(8 categories of building objects).Double annotation helped to investigate the relationship between visual salience objects at low semantic levels and urban land used categories at high semantic levels,and it could support two related tasks of building detection and street view image classification at the same time.Double annotation could enable two related tasks of building detection and street view image classification at the same time,and it may help examine the link between visual salience objects at low semantic levels and urban land used categories at high semantic levels.(2)The CNN-Transformer(C-Trans)based street image classification model was proposed.The model used the CNN module to extract the primary image features in the street view image,then the Transformer module to get the focused features with contextual information,and finally the visual word merging module to acquire a more compact feature focused for the final classification calculation.Experimental resulted on three street view datasets show,C-Trans had a large effect improvement compared to the normal CNN model and other CNN models with improved attention mechanism;C-Trans had slightly more effective than the representative visual Transformer model,and the computational effort was greatly reduced.(3)A two-phase feature adaptive weighting model for street view image classification was proposed.The model transformed the street view classification problem into a scene analysis problem,incorporating different adaptive weighting techniques based on attention mechanisms into the two-stage framework of "building detection-layout coding classification," aggregating low-level semantics extracted by visual detectors into high-level semantics,and improving the Street View classification task’s performance.In order to increase the detector’s performance,a local self-correlation guided feature adaptive weighting module was incorporated in the detection phase.The sequence information was then obtained by sorting the information of construction items using the centroid sorting algorithm.To produce the classification results,the sequence information was processed using the local cross-correlation guided feature weighting module.Experiments showed that the FAWNet suggested in this paper had a considerable effect improvement over the previous two-stage model,allowing for effective street view image classification.
Keywords/Search Tags:street view classification, urban land use classification, attention meshanism, CNN, Transformer
PDF Full Text Request
Related items