Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Hierarchical Multi-Scale Attention for Semantic Segmentation'

Method overview

name	Hierarchical Multi-Scale Attention for Semantic Segmentation
challenge	pixel-level semantic labeling
details	Multi-scale inference is commonly used to improve the results of semantic segmentation. Multiple images scales are passed through a network and then the results are combined with averaging or max pooling. In this work, we present an attention-based approach to combining multi-scale predictions. We show that predictions at certain scales are better at resolving particular failures modes and that the network learns to favor those scales for such cases in order to generate better predictions. Our attention mechanism is hierarchical, which enables it to be roughly 4x more memory efficient to train than other recent approaches. In addition to enabling faster training, this allows us to train with larger crop sizes which leads to greater model accuracy. We demonstrate the result of our method on two datasets: Cityscapes and Mapillary Vistas. For Cityscapes, which has a large number of weakly labelled images, we also leverage auto-labelling to improve generalization. Using our approach we achieve a new state-of-the-art results in both Mapillary (61.1 IOU val) and Cityscapes (85.4 IOU test).
publication	Hierarchical Multi-Scale Attention for Semantic Segmentation Andrew Tao, Karan Sapra, Bryan Catanzaro https://arxiv.org/abs/2005.10821
project page / code	https://github.com/NVIDIA/semantic-segmentation
used Cityscapes data	fine annotations, coarse annotations
used external data	Mapillary
runtime	n/a
subsampling	no
submission date	May, 2020
previous submissions	1

Average results

Metric	Value
IoU Classes	85.4336
iIoU Classes	70.4246
IoU Categories	93.1669
iIoU Categories	85.3891

Class results

Class	IoU	iIoU
road	98.9751	-
sidewalk	89.3836	-
building	94.904	-
wall	71.8393	-
fence	68.3844	-
pole	75.8568	-
traffic light	82.181	-
traffic sign	85.2755	-
vegetation	94.492	-
terrain	74.9706	-
sky	96.3065	-
person	90.1457	78.3944
rider	79.7149	63.5033
car	96.9621	92.6606
truck	82.5808	58.6633
bus	94.6009	70.0064
train	87.8013	65.6089
motorcycle	77.1554	62.2876
bicycle	81.7092	72.2725

Category results

Category	IoU	iIoU
flat	98.9131	-
nature	94.2904	-
object	80.8615	-
sky	96.3065	-
construction	94.9411	-
human	90.2188	79.2657
vehicle	96.637	91.5125

Links

Download results as .csv file