Method Details

Details for method 'Axial-DeepLab-XL [Mapillary Vistas]'


Method overview

name Axial-DeepLab-XL [Mapillary Vistas]
challenge pixel-level semantic labeling
details Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
ECCV 2020 (spotlight)
project page / code
used Cityscapes data fine annotations
used external data ImageNet, Mapillary Vistas
runtime n/a
subsampling no
submission date April, 2020
previous submissions


Average results

Metric Value
IoU Classes 84.081
iIoU Classes 65.9748
IoU Categories 92.6413
iIoU Categories 79.7343


Class results

Class IoU iIoU
road 98.8595 -
sidewalk 88.3354 -
building 94.5099 -
wall 69.0562 -
fence 67.8041 -
pole 74.4834 -
traffic light 80.2845 -
traffic sign 83.8604 -
vegetation 94.1366 -
terrain 72.8179 -
sky 96.1401 -
person 89.2767 72.0482
rider 78.2424 59.7257
car 96.4119 87.5444
truck 76.3721 51.742
bus 93.0339 69.2707
train 91.3078 66.711
motorcycle 75.674 59.596
bicycle 76.9331 61.1603


Category results

Category IoU iIoU
flat 98.7754 -
nature 93.8613 -
object 79.5828 -
sky 96.1401 -
construction 94.6956 -
human 89.298 73.1755
vehicle 96.1355 86.2931



Download results as .csv file

Benchmark page