Method Details


Details for method 'OCNet_ResNet101_fine'

 

Method overview

name OCNet_ResNet101_fine
challenge pixel-level semantic labeling
details Context is essential for various computer vision tasks. The state-of-the-art scene parsing methods define the context as the prior of the scene categories (e.g., bathroom, badroom, street). Such scene context is not suitable for the street scene parsing tasks as most of the scenes are similar. In this work, we propose the Object Context that captures the prior of the object's category that the pixel belongs to. We compute the object context by aggregating all the pixels' features according to a attention map that encodes the probability of each pixel that it belongs to the same category with the associated pixel. Specifically, We employ the self-attention method to compute the pixel-wise attention map. We further propose the Pyramid Object Context and Atrous Spatial Pyramid Object Context to handle the problem of multi-scales.
publication Anonymous
project page / code
used Cityscapes data fine annotations
used external data ImageNet
runtime n/a
subsampling no
submission date August, 2018
previous submissions

 

Average results

Metric Value
IoU Classes 81.1548
iIoU Classes 61.2678
IoU Categories 91.6382
iIoU Categories 81.1396

 

Class results

Class IoU iIoU
road 98.7466 -
sidewalk 87.1032 -
building 93.7191 -
wall 59.3567 -
fence 62.3087 -
pole 69.6428 -
traffic light 77.9923 -
traffic sign 80.7687 -
vegetation 93.9114 -
terrain 72.5633 -
sky 95.7587 -
person 87.5388 72.0626
rider 73.4976 54.5946
car 96.3655 90.7298
truck 73.6252 43.9236
bus 88.2223 57.5298
train 80.5809 53.2843
motorcycle 71.8972 51.2961
bicycle 78.3419 66.7217

 

Category results

Category IoU iIoU
flat 98.7554 -
nature 93.5601 -
object 75.7191 -
sky 95.7587 -
construction 94.0309 -
human 87.7125 73.2284
vehicle 95.9307 89.0509

 

Links

Download results as .csv file

Benchmark page