Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Axial-DeepLab-XL [Mapillary Vistas]'

Method overview

name	Axial-DeepLab-XL [Mapillary Vistas]
challenge	pixel-level semantic labeling
details	Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen ECCV 2020 (spotlight) https://arxiv.org/abs/2003.07853
project page / code	https://github.com/csrhddlam/axial-deeplab
used Cityscapes data	fine annotations
used external data	ImageNet, Mapillary Vistas
runtime	n/a
subsampling	no
submission date	April, 2020
previous submissions

Average results

Metric	Value
IoU Classes	84.081
iIoU Classes	65.9748
IoU Categories	92.6413
iIoU Categories	79.7343

Class results

Class	IoU	iIoU
road	98.8595	-
sidewalk	88.3354	-
building	94.5099	-
wall	69.0562	-
fence	67.8041	-
pole	74.4834	-
traffic light	80.2845	-
traffic sign	83.8604	-
vegetation	94.1366	-
terrain	72.8179	-
sky	96.1401	-
person	89.2767	72.0482
rider	78.2424	59.7257
car	96.4119	87.5444
truck	76.3721	51.742
bus	93.0339	69.2707
train	91.3078	66.711
motorcycle	75.674	59.596
bicycle	76.9331	61.1603

Category results

Category	IoU	iIoU
flat	98.7754	-
nature	93.8613	-
object	79.5828	-
sky	96.1401	-
construction	94.6956	-
human	89.298	73.1755
vehicle	96.1355	86.2931

Links

Download results as .csv file