Method Details
Details for method 'FarSee-Net'
Method overview
| name | FarSee-Net |
| challenge | pixel-level semantic labeling |
| details | FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of semantic segmentation is to deal with the objectscalevariationsandleveragethecontext.Howtoperform multi-scale context aggregation within limited computation budget is important. In this paper, firstly, we introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information. On the other hand, for runtime efficiency, state-of-the-art methods will quickly decrease the spatial size of the inputs or feature maps in the early network stages. The final high-resolution result is usuallyobtainedbynon-parametricup-samplingoperation(e.g. bilinear interpolation). Differently, we rethink this pipeline and treat it as a super-resolution process. We use optimized superresolution operation in the up-sampling step and improve the accuracy, especially in sub-sampled input image scenario for real-time applications. By fusing the above two improvements, our methods provide better latency-accuracy trade-off than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card. The proposed module can be plugged into any feature extraction CNN and benefits from the CNN structure development. |
| publication | FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution Zhanpeng Zhang and Kaipeng Zhang IEEE International Conference on Robotics and Automation (ICRA) 2020 |
| project page / code | |
| used Cityscapes data | fine annotations |
| used external data | ImageNet |
| runtime | 0.0119 s Nivida Titan X (Maxwell) |
| subsampling | 2 |
| submission date | September, 2019 |
| previous submissions |
Average results
| Metric | Value |
|---|---|
| IoU Classes | 68.3661 |
| iIoU Classes | 39.3392 |
| IoU Categories | 85.9333 |
| iIoU Categories | 69.7428 |
Class results
| Class | IoU | iIoU |
|---|---|---|
| road | 97.9262 | - |
| sidewalk | 81.4137 | - |
| building | 89.8609 | - |
| wall | 38.6137 | - |
| fence | 43.5661 | - |
| pole | 53.1829 | - |
| traffic light | 58.8496 | - |
| traffic sign | 64.3553 | - |
| vegetation | 91.0366 | - |
| terrain | 67.7472 | - |
| sky | 94.0262 | - |
| person | 75.9136 | 53.9215 |
| rider | 57.292 | 29.017 |
| car | 93.2387 | 86.0932 |
| truck | 55.876 | 19.2969 |
| bus | 67.7575 | 30.1973 |
| train | 55.0709 | 28.1058 |
| motorcycle | 49.3623 | 23.3532 |
| bicycle | 63.8673 | 44.7284 |
Category results
| Category | IoU | iIoU |
|---|---|---|
| flat | 98.1201 | - |
| nature | 90.5937 | - |
| object | 60.1353 | - |
| sky | 94.0262 | - |
| construction | 89.863 | - |
| human | 76.546 | 55.8137 |
| vehicle | 92.2484 | 83.6719 |
