Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Axial-DeepLab-L [Cityscapes-fine]'

Method overview

name	Axial-DeepLab-L [Cityscapes-fine]
challenge	pixel-level semantic labeling
details	Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen ECCV 2020 (spotlight) https://arxiv.org/abs/2003.07853
project page / code	https://github.com/csrhddlam/axial-deeplab
used Cityscapes data	fine annotations
used external data	ImageNet
runtime	n/a
subsampling	no
submission date	March, 2020
previous submissions

Average results

Metric	Value
IoU Classes	79.4968
iIoU Classes	57.4839
IoU Categories	91.4892
iIoU Categories	76.2137

Class results

Class	IoU	iIoU
road	98.6313	-
sidewalk	86.5022	-
building	93.4365	-
wall	52.0461	-
fence	61.3156	-
pole	70.1632	-
traffic light	77.5351	-
traffic sign	81.0828	-
vegetation	93.4519	-
terrain	72.3189	-
sky	95.6783	-
person	87.9289	68.2537
rider	75.6083	55.4641
car	96.0295	84.76
truck	68.4795	39.7625
bus	81.3838	55.8922
train	77.0785	51.1847
motorcycle	71.0344	51.0277
bicycle	70.7347	53.5266

Category results

Category	IoU	iIoU
flat	98.6638	-
nature	93.1728	-
object	76.0034	-
sky	95.6783	-
construction	93.6072	-
human	88.0594	69.6875
vehicle	95.2395	82.7399

Links

Download results as .csv file