StillBox is a depth-enabled synthetic dataset constructed with Blender featuring rigid scenes with stabilized images. It aims at mimicking a consumer drone flight, with very heterogeneous scene component, with random textures and sizes. That way, depth is very difficult to get solely from context, and structure-from-motion based depth algorithms focused on robustness should get a better advantage than single frame based algorithms.
Prefectly known Depth for every images
More than 800K frames 64x64 and 32K frames 512x512
Depth values from 10cm to 200m
displacement of 10cm between two frames
Random shapes, half are textured from randomly gather Flickr photos, half are textured with a simple color ramp
The Still Box Dataset consists in 4 different image sizes. Here is a brief recap of sizes.
A more recent version of Still Box Dataset with orientation changes and a lower field of view as long as 16/9 images is also available
Image Size | number of scenes | total size (GB) | compressed size (GB) | Download link |
---|---|---|---|---|
512x288 | 3.2K | 33 | 19 | Link available soon |
Each dataset consists in 16 folders.
Each of these folders will have a metadata.json
file to describe the content of the folder
{
"args": {},
"scenes":[],
"fov": 90,
"scenes_nb": 5000,
"resolution":[64,64]
}
Each element of the list of scenes has the same structure
{
"depth": [],
"imgs":[],
"length":10,
"speed": [x,y,z],
"orientation": [[w,x,y,z], ...],
"time_step":0.1
}
depth
and imgs
are lists of file paths. they should be of the same length specified in length
, and the nth element of depth
should match the nth element of imgs
speed
is a either a 3D vector of an array of n 3D vectors, coordinates are in m/s, defined in [Right Up Forward] system, relative to the camera. If there is only one 3D vector, it means the speed was the same for all the scene. Otherwise, the ith speed is the mean speed between ith and i+1th frame.Orientation
is a an array of n 4D quaternions, relative to the camera at first frametime_step
is the time between each frame.To get 3D displacement between two frames, you can compute it with displacement = shift * time_step * speed
displacement = time_step *
\(\sum_{i=0}^{shift} speed[t+i] \)
args
has this structure (among other options)
{
"clip": [0.1, 200],
"meshes_nb": 20,
"meshes_var": [4.0, 15.0],
"texture_ratio": 0.5
}
clip
is clipping distance, objects as near as 0.1m or as far as 200m won’t appearmeshe_nb
is the number of shapes in each scene, you may not see all of the same at once in the framesmeshes_ver
is the variation in size and position of the meshes of the scene. Both in meterstexture_ratio
is the ratio of textured shapes. Other shapes have a unified color textureStill Box is currently used in two projects :
If you use this dataset in your research, please cite us with the following citation
@Article{depthnet17,
AUTHOR = {Pinard, Clement and Chevalley, Laure and Manzanera, Antoine and Filliat, David},
TITLE = {End-to-end depth from motion with stabilized monocular videos},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}