UvA Multi-Camera Multi-Person Tracking Benchmark
        Home Gavrila
        UvA Pers Data
        mCam Pose Est
        mCam mPers Track

This page covers the multi-camera, multi-person surveillance datasets presented in:

M. C. Liem and D. M. Gavrila. “A comparative study on multi-person tracking using overlapping cameras”. Proc. of the International Conference on Computer Vision Systems (ICVS), vol. 7963 in Lecture Notes in Computer Science, pp. 203–212, 2013.

There are two datasets, containing multi-camera, multi-person scenarios in a surveillance setting. The Train station dataset is recorded with a small number of people, outside, on a train platform. The Hall dataset is more challenging; it is recorded with an increasing number of people (up to 23) in an indoor environment.

Train station dataset 

The outdoor "Train station data" contains 14 sequences of in total 8543 frames, recorded on a train platform. Between two and five actors enact various situations ranging from persons waiting for a train to fighting hooligans. The scenes have dynamic backgrounds with trains passing by and people walking on the train platform. Lighting conditions vary significantly over time. The area of interest is 7.6 x 12 m and is viewed by three overlapping, frame synchronized cameras. Frames have a resolution of 752 x 560 pixels, recorded at 20 fps. Ground truth (GT) person locations are obtained at each frame by labeling torso positions, annotating the shoulder and pelvis locations of all persons in all cameras and projecting these onto the ground plane.

Hall dataset

The indoor "Hall data" is a single 9151 frame sequence recorded in a large central hall. During the first half of the scene, actors move in and out of the scene in small groups. After this, two groups of about eight people each enter one by one and start arguing and fighting. The 12 x 12 m area of interest is viewed by four overlapping, frame synchronized cameras. Frames have a resolution of 1024 x 768 pixels, recorded at 20 fps. GT positions are generated every 20th frame by annotating every person's head location in every camera, triangulating these points in 3D and projecting them onto the ground plane.

Data description

RAW images

All frames are recorded in RAW format using a GRGB Bayer filter and have to be interpolated to construct an RGB image. This can for example be done using the OpenCV function cvtColor(RAW_IM, RGB_IM, CV_BayerGR2BGR, 3), where RAW_IM is the input file and RGB_IM is the output file.


Frames for each camera are provided in a separate ZIP file per camera. For the Train station data, there are 14 sub folders for the 14 scenarios. Because the Hall data consists of one very long scene, the data has been devided over 38 sub folders numbered 000000 - 000037. Each of these sub folders contains 250 images (1000 images if all cameras are put together), except for the first and last sub foldersub, which contain 80 and 71 files respectively.

All files have been named according to the following scheme: frame<XXXXXX>_cam<X>_<XXXXXXXXXX.XXXXXX>.raw, where the first X-es are the frame number, the next X is the camera number and the last set of X-es is the frame timestamp, provided as a UNIX timestamp in seconds (with microsecond accuracy).

Background images

Together with the frames for all scenarios, frames showing the empty scene are provided to learn background models. 10 frames are provided in RAW format for the Hall dataset and most Train station scenarios. For two Train station scenarios (scenario12-2 and scenario13-2), no frames of the empty scene with matching lighting conditions were available. For these scenarios, an empty background image was constructed for each camera. These images are provided as PPM files named <X>_bgim.ppm, where <X> is the camera number. Background files can be found in the "backgrounds" folder in each camera folder.

Camera calibration

All cameras are calibrated intrinsicly as well as extrinsicly. Calibration parameters are provided in sparate ZIP files and are stored in files named <data>_c<x>_calib.txt, where <data> is either "train" for the train station data or "hall" for the hall data, and <x> is the camera number. Together with the data for the intrinsic and extrinsic camera calibration matrices, these files also provide the distortion coefficients k1-k5.

Furthermore, groundPlane.gpl files defining the ground plane are provided, in which the ground plane is defined as the normal vector to the plane (3 coordinates) and an offset on that vector. Finally, files named <data>_roi.txt (with <data> either "train" or "hall") are provided, defining the position and orientation of the regions of interest in world coordinates.


Annotations are provided in separate ZIP files, as a single TXT file per scenario. Person positions are given as XY ground plane coordinates, in millimeters. Each file is structured as follows:

































The first row contains column labels. The columns p<X>_x and p<X>_y represent the annotated x and y ground plane coordinate for person <X>. When no annotation is available for a certain person at a certain time step, it's XY position is specified as <NaN, NaN>.

The file evaluation.zip contains a Matlab script to compute the CLEAR_MOT metrics used in the above publication. The script calculate_metrics.m takes two TXT files as input, containing tracking results and annotations, and computes the CLEAR-MOT metrics from these files. In order to use this script, the tracking results should be provided in the same way as teh annotation data, where tracks that last for periods of time shorter than the full scenario length are padded with NaN values, similar to missing values in the annotation data. Note that the script assumes the first line of each file to contain column labels, and as such, skips the first line when reading the file!

Similar to the publication mentioned above, the evaluation of tracking results is done in the area covered by all cameras. If a detection or annotation is outside this area, or is matched to an annotation or detection outside this area, it is not counted as an correct match, nor as an error. The area covered by all cameras is specified as a binary lookup image matching the region of interest. The images are found in the evaluation folder, named boundim_3_cam_trainstation.png (Train station data) and boundim_4_cam_hall.png (Hall data) and are automatically used by the evaluation script when specifying 'trainstation' or 'hall' as input to the compute_metrics() function.

License Terms

 This dataset is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:

  1. That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, the University of Amsterdam (or this webhost) does not accept any responsibility for errors or omissions.
  2. That you include a reference to the above publication in any published work that makes use of the dataset.
  3. That if you have altered the content of the dataset or created derivative work, prominent notices are made so that any recipients know that they are not receiving the original data.
  4. That you may not use or distribute the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
  5. That this original license notice is retained with all copies or derivatives of the dataset.
  6. That all rights not expressly granted to you are reserved by the University of Amsterdam.

I DO NOT AGREE with the above license terms             I AGREE with the above license terms

[Home Gavrila] [Resume] [Research] [People] [Publications] [Datasets] [Media Coverage] [Open Positions] [Search]

Copyright © 2001-2013 Gavrila. All rights reserved. Disclaimer.