[Dataset] This grocery food dataset contains 1719 videos with 23 classes, which are subdivided into 98 subclasses. The videos were recorded in two SPAR grocery stores in HD (1920x1080 and 1280x720) using five mobile phones (Samsung Galaxy S2, Samsung Galaxy S3, Motorola Moto G, HTC One, LG Nexus 4).
- MANGO - Mobile Augmented Reality with Functional Eating Guidance and Food Awareness (bib) In Proc. International Workshop on Multimedia Assisted Dietary Management (MADIMA, in conjunction with ICIAP), 2015
[Dataset] This dataset contains 7 challenging volleyball activity classes annotated in 6 videos from professionals in the Austrian Volley League (season 2011/12). A total of 36178 annotations within 18960 frames are provided along with the HD video files (1920x1080 @25fps, DX50 codec). The seven classes consist of 5 volleyball specific classes ('Serve', 'Reception', 'Setting', 'Attack', 'Block') and 2 more general classes ('Stand', 'Defense/Move').
Download: Code (vb14_code.zip (2.5MB)), Videos (graz-arbesbach_2.avi (2GB), graz-arbesbach_3.avi (2GB), graz-arbesbach_4.avi (2GB), graz-arbesbach_5.avi (2GB), graz-gleisdorf_1.avi (3GB), graz-klagenfurt1_2.avi (2GB))
- Improved Sport Activity Recognition using Spatio-temporal Context (bib) In Proc. DVS-Conference on Computer Science in Sport (DVS/GSSS), 2014
- Indoor Activity Detection and Recognition for Automated Sport Games Analysis (bib) In Proc. Workshop of the Austrian Association for Pattern Recognition (AAPR/OAGM), 2014
[Dataset] Annotated Facial Landmarks in the Wild
[Dataset] Annotated Facial Landmarks in the Wild (AFLW) provides a large-scale collection of annotated face images gathered from the web, exhibiting a large variety in appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total about 25k faces are annotated with up to 21 landmarks per image.
- Robust Face Detection by Simple Means (bib) In Computer Vision in Applications Workshop (DAGM), 2012
- Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization (bib) In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011
[Code] Large Scale Metric Learning from Equivalence Constraints
We provide the code and data to reproduce all experiments of our CVPR'12 paper Large Scale Metric Learning from Equivalence Constraints. In this paper, we raise important issues on scalability and the required degree of supervision of existing Mahalanobis metric learning methods. Often rather tedious optimization procedures are applied that become computationally intractable on a large scale. With our Keep It Simple and Straightforward MEtric (KISSME) we introduce a simple though effective strategy to learn a distance metric from equivalence constraints. Our method is orders of magnitudes faster than comparable methods.
- Large Scale Metric Learning from Equivalence Constraints (bib) In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Synergy-based Learning of Facial Identity
In Proc. DAGM Symposium, 2012
(Winner of the Best Paper Award)
We provide a reimplementation of our CVPR'15 paper In Defense of Color-based Model-free Tracking. In this work, we address color-based online object tracking where neither class-specific prior knowledge nor pre-learned object models are available. Combining a discriminative object-vs-background model with an additional distractor-aware term, we show that tracker based on standard color representations (such as histograms) can achieve state-of-the-art performance on a variety of test sequences.
Download: Code & demo data (~ 5.4 MB, MATLAB)
Contact: Horst Possegger, Thomas Mauthner
We provide the code of our CVPR'14 paper Occlusion Geodesics for Online Multi-Object Tracking. In this work, we address the problem of correctly assigning noisy detection results to trajectories of multiple objects. In particular, we exploit the spatio-temporal evolution of occlusion regions, detector reliability, and target motion prediction to robustly handle missed detections. In combination with a conservative association scheme for visible objects, this allows for real-time tracking of multiple objects from a single static camera.
Download: Code & demo data (~ 31 MB, MATLAB)
- Occlusion Geodesics for Online Multi-Object Tracking (bib) (code) In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
Contact: Horst Possegger, Thomas Mauthner
This dataset contains 6 indoor people tracking scenarios recorded at our laboratory using 4 static Axis P1347 cameras:
- Changing appearance (Chap): This sequence depicts a standard surveillance scenario, where 5 people move unconstrained within the laboratory. Throughout the scene, the people change their visual appearance by putting on jackets with significantly different colors than their sweaters.
- Leapfrogs (Leaf 1 & 2): These scenarios depict leapfrog games where players leap over each other’s stooped backs. Specific challenges of these sequences are the spatial proximity of players, out-of-plane motion, and difficult poses.
- Musical chairs (Much): This sequence shows 4 people playing musical chairs and a non-playing moderator who starts and stops the recorded music. Due to the nature of this game, this sequence exhibits fast motion, as well as crowded situations, e.g., when all players race to the available chairs. Furthermore, sitting on the chairs is a rather unusual pose for typical surveillance scenarios and violates the commonly used constraint of standing persons.
- Pose: This sequence shows up to 6 people in various poses, such as standing, walking, kneeling, crouching, crawling, sitting, and stepping on ladders.
- Table: This scenario exhibits significant out-of-plane motion as up to 5 people walk and jump over a table.
For each scenario, we provide the synchronized video streams, the full (extrinsic & intrinsic) camera calibration, manually annotated groundtruth for every 10th frame, as well as a top-view model of the ground-plane.
- Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities (bib) (data) In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Contact: Horst Possegger
This dataset contains the video streams and calibrations of several static Axis P1347 cameras and one panoramic video from a spherical Point Grey Ladybug3 camera for two scenarios. The first scenario (outdoor) shows a crowded campus of our university, while the second sequence (indoor) was recorded during the preparations of a European handball training game at a sports hall in Graz. The panoramic imagery can be used to simulate a PTZ camera with the provided implementation of the virtual PTZ (vPTZ) camera.
- Unsupervised Calibration of Camera Networks and Virtual PTZ Cameras (bib) (code) (data) In Proc. Computer Vision Winter Workshop (CVWW), 2012
Contact: Horst Possegger, Sabine Sternig
[Dataset] PRID 2011
This dataset was created in co-operation with the Austrian Institute of Technology for the purpose of testing person re-identification approaches. The dataset consists of images extracted from multiple person trajectories recorded from two different static surveillance cameras. Images from these cameras contain a viewpoint change and a stark difference in illumination, background and camera characteristics. Since images are extracted from trajectories, several different poses per person are available in each camera view. We have recorded 475 person trajectories from one view and 856 from the other one, with 245 persons appearing in both views. Details can be found here.
Person Re-Identification by Descriptive and Discriminative Classification
In Proc. Scandinavian Conference on Image Analysis (SCIA), 2011
(The original publication is available at www.springerlink.com)
Contact: Martin Hirzer
[Dataset] PRID 450S
This dataset was created in co-operation with the Austrian Institute of Technology for the purpose of testing person re-identification approaches. It is based on the PRID 2011 dataset and contains 450 image pairs recorded from two different, static surveillance cameras. Additionally, the dataset also provides an automatically generated, motion based foreground/background segmentation as well as a manual segmentation of parts of a person.
Mahalanobis Distance Learning for Person Re-Identification
In Person Re-Identification, pages 247-267, Springer, 2014
(The original publication is available at www.springer.com)
Contact: Martin Hirzer
[Code] LibBOBBOB is an C++ online learning toolbox for computer vision that is easy to use, leightweight and simple. It supports several levels for learning and to exchange components of the learner by modifying the configuration file. BOB has been initially started by Martin Godec in mid of 2009 at the Institute for Computer Graphics and Vision (ICG) at Graz University of Technology to replace and merge older Frameworks and Code that was present at the time. The provided pre-compiled linux library supports binary and mutli-class classification and different configuration parameters. It has been compiled under Ubuntu 9.10 (Karmic Koala, 64 bit) using OpenCV 2.0. If you want to get sourcecode for academic or personal useage, please contact us.
- Context-driven Clustering by Multi-class Classification in an Active Learning Framework (bib) In Proc. Workshop on Use of Context in Video Processing (CVPR), 2010
- Online Multi-Class LPBoost (bib) In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010
- TransientBoost: On-line Boosting with Transient Data (bib) In Proc. IEEE Online Learning for Computer Vision Workshop (CVPR), 2010
- On Robustness of On-line Boosting - A Competitive Study (bib) In Proc. IEEE On-line Learning for Computer Vision Workshop, 2009
- On-line Random Forests (bib) In Proc. IEEE On-line Learning for Computer Vision Workshop, 2009
- Online Random Forests (bib) In Proc. IEEE On-line Learning for Computer Vision Workshop, 2009
Contact: Martin Godec
[Dataset] Longterm Pedestrian DatasetLongterm Dataset (24h / 7 Days / ~1fps)
Download: longterm dataset
- Classifier Grids for Robust Adaptive Object Detection (bib) In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009
Contact: Sabine Sternig
Demo Real-time Action Recognition
Contact: Thomas Mauthner
[Dataset] Multi-Camera Datasets
- Easy Data Set (just one person)
- Medium Data Set (3-5 persons, used for the experiments)
- Hard Data Set (crowded scene, 5+ persons)
- Centralized Information Fusion for Learning Object Detectors in Multi-Camera Networks (bib) In Proc. Workshop of the Austrian Association for Pattern Recognition, 2010
- Multiple Instance Learning from Multiple Cameras (bib) In Proc. IEEE Workshop on Camera Networks (CVPR), 2010
- Online Learning of Person Detectors by Co-Training from Multiple Cameras (bib) In Multi-Camera Networks, Principles and Applications, pages 313-334, Academic Press, 2009
- Visual On-line Learning in Distributed Camera Networks (bib) In Proc. Int'l Conf. on Distributed Smart Cameras, 2008
Contact: Peter M. Roth
Text and Vision (TVGraz) Dataset
TVGraz is an annotated multi-modal dataset which currently contains 10 visual object categories , 4030 images and associated text. The visual appearance of the objects in the dataset is challenging and offers a less biased benchmark. The objective of the multi-modal dataset is to provide a common means for evaluation of object categorization research based on text and vision.
The archive "TVGraz_script.tar.gz" contain a python script name "download_TVGRAZ_dataset.py", which will download TVGraz dataset images and text from their respective urls, upon execution and according to the "category_list.txt" file. After downloading the textual data will be in raw format per category per image.
Download: TVGraz dataset capturing tool
Contact: Inayatullah Khan