Method Details


Details for method 'Hierarchical Multi-Scale Attention for Semantic Segmentation'

 

Method overview

name Hierarchical Multi-Scale Attention for Semantic Segmentation
challenge pixel-level semantic labeling
details Multi-scale inference is commonly used to improve the results of semantic segmentation. Multiple images scales are passed through a network and then the results are combined with averaging or max pooling. In this work, we present an attention-based approach to combining multi-scale predictions. We show that predictions at certain scales are better at resolving particular failures modes and that the network learns to favor those scales for such cases in order to generate better predictions. Our attention mechanism is hierarchical, which enables it to be roughly 4x more memory efficient to train than other recent approaches. In addition to enabling faster training, this allows us to train with larger crop sizes which leads to greater model accuracy. We demonstrate the result of our method on two datasets: Cityscapes and Mapillary Vistas. For Cityscapes, which has a large number of weakly labelled images, we also leverage auto-labelling to improve generalization. Using our approach we achieve a new state-of-the-art results in both Mapillary (61.1 IOU val) and Cityscapes (85.4 IOU test).
publication Hierarchical Multi-Scale Attention for Semantic Segmentation
Andrew Tao, Karan Sapra, Bryan Catanzaro
https://arxiv.org/abs/2005.10821
project page / code https://github.com/NVIDIA/semantic-segmentation
used Cityscapes data fine annotations, coarse annotations
used external data Mapillary
runtime n/a
subsampling no
submission date May, 2020
previous submissions 1

 

Average results

Metric Value
IoU Classes 85.4336
iIoU Classes 70.4246
IoU Categories 93.1669
iIoU Categories 85.3891

 

Class results

Class IoU iIoU
road 98.9751 -
sidewalk 89.3836 -
building 94.904 -
wall 71.8393 -
fence 68.3844 -
pole 75.8568 -
traffic light 82.181 -
traffic sign 85.2755 -
vegetation 94.492 -
terrain 74.9706 -
sky 96.3065 -
person 90.1457 78.3944
rider 79.7149 63.5033
car 96.9621 92.6606
truck 82.5808 58.6633
bus 94.6009 70.0064
train 87.8013 65.6089
motorcycle 77.1554 62.2876
bicycle 81.7092 72.2725

 

Category results

Category IoU iIoU
flat 98.9131 -
nature 94.2904 -
object 80.8615 -
sky 96.3065 -
construction 94.9411 -
human 90.2188 79.2657
vehicle 96.637 91.5125

 

Links

Download results as .csv file

Benchmark page