Method Details

Details for method 'Axial-DeepLab-XL [Cityscapes-fine]'


Method overview

name Axial-DeepLab-XL [Cityscapes-fine]
challenge pixel-level semantic labeling
details Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
ECCV 2020 (spotlight)
project page / code
used Cityscapes data fine annotations
used external data ImageNet
runtime n/a
subsampling no
submission date March, 2020
previous submissions


Average results

Metric Value
IoU Classes 79.8802
iIoU Classes 57.2981
IoU Categories 91.6342
iIoU Categories 76.6934


Class results

Class IoU iIoU
road 98.6914 -
sidewalk 86.9585 -
building 93.5435 -
wall 57.9329 -
fence 60.357 -
pole 70.8665 -
traffic light 77.8618 -
traffic sign 81.3974 -
vegetation 93.7382 -
terrain 72.8237 -
sky 95.6477 -
person 87.9059 68.9554
rider 75.3043 55.1193
car 96.0997 85.1918
truck 65.8359 37.3287
bus 80.4516 54.379
train 78.6901 52.2905
motorcycle 72.7989 50.6724
bicycle 70.8193 54.4473


Category results

Category IoU iIoU
flat 98.687 -
nature 93.4261 -
object 76.4727 -
sky 95.6477 -
construction 93.8498 -
human 88.1048 70.3832
vehicle 95.2511 83.0036



Download results as .csv file

Benchmark page