Method Details


Details for method 'FarSee-Net'

 

Method overview

name FarSee-Net
challenge pixel-level semantic labeling
details FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of semantic segmentation is to deal with the objectscalevariationsandleveragethecontext.Howtoperform multi-scale context aggregation within limited computation budget is important. In this paper, firstly, we introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information. On the other hand, for runtime efficiency, state-of-the-art methods will quickly decrease the spatial size of the inputs or feature maps in the early network stages. The final high-resolution result is usuallyobtainedbynon-parametricup-samplingoperation(e.g. bilinear interpolation). Differently, we rethink this pipeline and treat it as a super-resolution process. We use optimized superresolution operation in the up-sampling step and improve the accuracy, especially in sub-sampled input image scenario for real-time applications. By fusing the above two improvements, our methods provide better latency-accuracy trade-off than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card. The proposed module can be plugged into any feature extraction CNN and benefits from the CNN structure development.
publication FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution
Zhanpeng Zhang and Kaipeng Zhang
IEEE International Conference on Robotics and Automation (ICRA) 2020
project page / code
used Cityscapes data fine annotations
used external data ImageNet
runtime 0.0119 s
Nivida Titan X (Maxwell)
subsampling 2
submission date September, 2019
previous submissions

 

Average results

Metric Value
IoU Classes 68.3661
iIoU Classes 39.3392
IoU Categories 85.9333
iIoU Categories 69.7428

 

Class results

Class IoU iIoU
road 97.9262 -
sidewalk 81.4137 -
building 89.8609 -
wall 38.6137 -
fence 43.5661 -
pole 53.1829 -
traffic light 58.8496 -
traffic sign 64.3553 -
vegetation 91.0366 -
terrain 67.7472 -
sky 94.0262 -
person 75.9136 53.9215
rider 57.292 29.017
car 93.2387 86.0932
truck 55.876 19.2969
bus 67.7575 30.1973
train 55.0709 28.1058
motorcycle 49.3623 23.3532
bicycle 63.8673 44.7284

 

Category results

Category IoU iIoU
flat 98.1201 -
nature 90.5937 -
object 60.1353 -
sky 94.0262 -
construction 89.863 -
human 76.546 55.8137
vehicle 92.2484 83.6719

 

Links

Download results as .csv file

Benchmark page