Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'SPGNet'

Method overview

name	SPGNet
challenge	pixel-level semantic labeling
details	Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and show superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only 'fine' annotations.
publication	SPGNet: Semantic Prediction Guidance for Scene Parsing Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas Huang, Wen-Mei Hwu, Honghui Shi ICCV 2019 https://arxiv.org/abs/1908.09798
project page / code
used Cityscapes data	fine annotations
used external data	ImageNet
runtime	n/a
subsampling	no
submission date	March, 2019
previous submissions

Average results

Metric	Value
IoU Classes	81.0901
iIoU Classes	61.447
IoU Categories	92.1467
iIoU Categories	82.0908

Class results

Class	IoU	iIoU
road	98.7924	-
sidewalk	87.605	-
building	93.7729	-
wall	56.4799	-
fence	61.9184	-
pole	71.8958	-
traffic light	79.9507	-
traffic sign	82.0779	-
vegetation	94.0825	-
terrain	73.5125	-
sky	96.099	-
person	88.6817	73.4989
rider	74.9341	54.8083
car	96.4726	91.5607
truck	67.3384	42.5601
bus	84.8188	57.6436
train	81.7982	53.6576
motorcycle	71.1094	51.2846
bicycle	79.3718	66.5625

Category results

Category	IoU	iIoU
flat	98.7939	-
nature	93.7539	-
object	77.4615	-
sky	96.099	-
construction	94.2022	-
human	88.8411	74.4925
vehicle	95.8751	89.6891

Links

Download results as .csv file