The Cityscapes Dataset

We present a new large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. Details on annotated classes and examples of our annotations are available at this webpage.

The Cityscapes Dataset is intended for

  1. assessing the performance of vision algorithms for major tasks of semantic urban scene understanding: pixel-level, instance-level, and panoptic semantic labeling;
  2. supporting research that aims to exploit large volumes of (weakly) annotated data, e.g. for training deep neural networks.

Latest News

  • Cityscapes 3D Dataset Released
    Cityscapes 3D is an extension of the original Cityscapes with 3D bounding box annotations for all types of vehicles as well as a benchmark for the 3D detection task. For more details please refer to our paper, presented at the CVPR 2020 Workshop on Scalability in Autonomous Driving. Today, we released our 3D bounding box annotations of all vehicle types, i.e. car, truck, bus, on rails, motorcycle, bicycle, caravan, and trailer. The box annotations feature a full 3D orientation including yaw, pitch, and roll labels. The annotations are available on our download page. Our toolbox supports the new annotations and […]


This Cityscapes Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.