Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Naive-Student (iterative semi-supervised learning with Panoptic-DeepLab)'

Method overview

name	Naive-Student (iterative semi-supervised learning with Panoptic-DeepLab)
challenge	pixel-level semantic labeling
details	Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the expense of human annotation is especially large, yet large amounts of unlabeled data may exist. In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation. The goal of this work is to avoid the construction of sophisticated, learned architectures specific to label propagation (e.g., patch matching and optical flow). Instead, we simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data. The procedure is iterated for several times. As a result, our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.8% PQ, 42.6% AP, and 85.2% mIOU on the test set. We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences to surpass state-of-the-art performance on core computer vision tasks.
publication	Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens https://arxiv.org/abs/2005.10266
project page / code
used Cityscapes data	fine annotations, video
used external data	ImageNet, Mapillary Vistas Research Edition. Cityscapes train-extra set (coarse labels are not used but only images).
runtime	n/a
subsampling	no
submission date	April, 2020
previous submissions

Average results

Metric	Value
IoU Classes	85.1738
iIoU Classes	68.8238
IoU Categories	92.9195
iIoU Categories	81.9744

Class results

Class	IoU	iIoU
road	98.8251	-
sidewalk	88.2864	-
building	94.585	-
wall	65.2606	-
fence	69.6367	-
pole	75.2256	-
traffic light	80.9368	-
traffic sign	84.4137	-
vegetation	94.2839	-
terrain	74.451	-
sky	96.2182	-
person	89.9846	75.4622
rider	79.6991	62.1307
car	96.6685	88.9953
truck	82.9717	57.225
bus	95.5578	70.9907
train	93.3543	67.2109
motorcycle	78.3798	62.2471
bicycle	79.564	66.3282

Category results

Category	IoU	iIoU
flat	98.8496	-
nature	94.0671	-
object	80.2348	-
sky	96.2182	-
construction	94.848	-
human	89.8476	76.1456
vehicle	96.3713	87.8032

Links

Download results as .csv file