StillBox Dataset

Overview

StillBox is a depth-enabled synthetic dataset constructed with Blender featuring rigid scenes with stabilized images. It aims at mimicking a consumer drone flight, with very heterogeneous scene component, with random textures and sizes. That way, depth is very difficult to get solely from context, and structure-from-motion based depth algorithms focused on robustness should get a better advantage than single frame based algorithms.

Features

  • Prefectly known Depth for every images

  • More than 800K frames 64x64 and 32K frames 512x512

  • Depth values from 10cm to 200m

  • displacement of 10cm between two frames

  • Random shapes, half are textured from randomly gather Flickr photos, half are textured with a simple color ramp


The Still Box Dataset consists in 4 different image sizes. Here is a brief recap of sizes.

Image Size number of scenes total size (GB) compressed size (GB) Download Link
64x64 80K 19 9.8 Download
128x128 16K 12 7.1 Download
256x256 3.2K 8.5 5 Download
512x512 3.2K 33 19 Download

A more recent version of Still Box Dataset with orientation changes and a lower field of view as long as 16/9 images is also available

Image Size number of scenes total size (GB) compressed size (GB) Download link
512x288 3.2K 33 19 Link available soon

Dataset description

Each dataset consists in 16 folders.
Each of these folders will have a metadata.json file to describe the content of the folder

{
  "args": {},
  "scenes":[],
  "fov": 90,
  "scenes_nb": 5000,
  "resolution":[64,64]
}
  • fov is in degrees and refers to the horizontal fov in case of rectangular frames
  • resolution is in pixels
  • args is a dump of the blender script settings of the scenes when it was generated

Each element of the list of scenes has the same structure

{
  "depth": [],
  "imgs":[],
  "length":10,
  "speed": [x,y,z],
  "orientation": [[w,x,y,z], ...],
  "time_step":0.1
}
  • depth and imgs are lists of file paths. they should be of the same length specified in length, and the nth element of depth should match the nth element of imgs
  • speed is a either a 3D vector of an array of n 3D vectors, coordinates are in m/s, defined in [Right Up Forward] system, relative to the camera. If there is only one 3D vector, it means the speed was the same for all the scene. Otherwise, the ith speed is the mean speed between ith and i+1th frame.
  • Orientation is a an array of n 4D quaternions, relative to the camera at first frame
  • . The ith orientation is the orientation of the ith frame. When orientation is constant, the field is not present in the json
  • time_step is the time between each frame.

To get 3D displacement between two frames, you can compute it with displacement = shift * time_step * speed

for homogenous speed or displacement = time_step * \(\sum_{i=0}^{shift} speed[t+i] \)

args has this structure (among other options)

{
 "clip": [0.1, 200],
 "meshes_nb": 20,
 "meshes_var": [4.0, 15.0],
 "texture_ratio": 0.5
 }
  • clip is clipping distance, objects as near as 0.1m or as far as 200m won’t appear
  • meshe_nb is the number of shapes in each scene, you may not see all of the same at once in the frames
  • meshes_ver is the variation in size and position of the meshes of the scene. Both in meters
  • texture_ratio is the ratio of textured shapes. Other shapes have a unified color texture

How to use

Still Box is currently used in two projects :

  • DepthNet

    • This project only use the simple Still Box, with homogeneous speed and constant orientation
    • See a training example on Github
    • More info about the project here
  • Unsupervised DepthNet

    • This project uses the version with rotation and performs an a posterio stabilization
    • See a training example on Github
    • More info about the project here

Torrent links


Citation

If you use this dataset in your research, please cite us with the following citation

@Article{depthnet17,
AUTHOR = {Pinard, Clement and Chevalley, Laure and Manzanera, Antoine and Filliat, David},
TITLE = {End-to-end depth from motion with stabilized monocular videos},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}