On-line Learning from Multiple Cameras


Recently, combining information from multiple cameras has shown to be very beneficial for object detection and tracking. In contrast, the goal of this project is to train detectors exploiting the vast amount of unlabeled data given by geometry information of a specific multiple camera setup. Starting from a small number (as small as one!) of positive training samples, we apply a co-training strategy in order to generate new very valuable samples from unlabeled data that could not be obtained otherwise [4].

To cope with insufficiently (but correct) detections during the training process, we introduced a multi-camera MIL approach, which clearly improves the robustness during learning. Further, to extend the approach to more than two cameras a straight forward extension is to introduce a more efficient centralized information fusion approach [1]. The approach, although not limited to this application, is applied for learning a person detector on different challenging scenarios. In fact, we can show that even though starting from a very small number of labeled samples we finally obtain a classifier yielding state-of-the-art detection results (for which typically 10,000s of labeled samples are required!).

Demo: Learning from Multiple Cameras

To demonstrate the proposed approach in the following we show two experiments: (I) scene specific on-line co-training and evaluation on the lab scenario. (II) generalizing classifier: on-line training on the lab scenario; evaluation on independent test sets. To play the videos, just click the corresponding images!

For Experiment I the left and the right video show the detection results obtained by evaluating the initial and the finally obtained classifier, respectively. The video in the middle is a visualization of the updates performed during co-training: a red bounding box indicates a negative update, a green bounding box an identified positive update, and a white bounding-box a detection that is not used for updating. Please note, that for initializing the training process only 5 (!) labeled samples were used!

For Experiment II we want to show that a classifier that was trained on our lab scenario also generalizes to different scenarios. For that purpose initialized a classifier as described above, co-trained it from multiple cameras in our lab, and applied the finally obtained classifier to a totally different data set. The detection results for the initial and final classifier, respectively, are shown below.
Initial Classifier Initial Process Final Classifier
Initial Classifier Final Classifier

Learning Behavior and Final Results

First, we demonstrate the on-line learning behavior of the proposed approaches. For that purpose, we trained an initial classifier using a small number of labeled samples. The classifier was cloned and used to initialize the co-training process for each camera. Later these initial classifiers were updated by co-training. To demonstrate the learning progress, after a pre-defined number of processed training frames we saved the corresponding classifier, which was then evaluated on an independent test sequence.
Learning Over Time MC_INIT MC_INIT
Precision over time Recall over time
Next, we give a competitive study compared to state-of-the-art person detectors for the Lab Scenario as well as for the Forecourt Scenario. In addition to the adaptive methods described above, we compared the results to fixed persons detectors, i.e., the Dalal & Triggs person detector and the person detector trained by using the deformable part model of Felzenszwalb et al. For the adaptive methods the classifiers were trained on the same training data as the proposed method and the finally obtained classifiers were evaluated on the test sequences.
Final Results: scene-specific setup MC_INIT MC_INIT
Forcourt Scenario Lab Scenario
Finally, we show that using the proposed approach not only scene/view specific classifiers can be trained, but that these classifiers are also generalizing to different views/scenarios. For that purpose, performed MC co-training on the Lab Scenario data set and applied the finally obtained classifier on two publicly available standard benchmark datasets, i.e., the PETS'06 and the CAVIAR dataset.
Final Results: generalized setup MC_INIT MC_INIT
PETS'06 Caviar

Data Sets

The data sets used in publications [1-4] can be downloaded below. We will also provide a ground truth for the evaluation sets soon.

Selected Publications

  1. Centralized Information Fusion for Learning Object Detectors in Multi-Camera Networks (bib)Armin Berger, Peter M. Roth, Christian Leistner, and Horst Bischof In Proc. Workshop of the Austrian Association for Pattern Recognition, 2010
  2. Multiple Instance Learning from Multiple Cameras (bib)Peter M. Roth, Christian Leistner, Armin Berger, and Horst Bischof In Proc. IEEE Workshop on Camera Networks (CVPR), 2010
  3. Online Learning of Person Detectors by Co-Training from Multiple Cameras (bib)Peter M. Roth, Christian Leistner, Helmut Grabner, and Horst Bischof In Multi-Camera Networks, Principles and Applications, pages 313-334, Academic Press, 2009
  4. Visual On-line Learning in Distributed Camera Networks (bib)Christian Leistner, Peter M. Roth, Helmut Grabner, Andreas Starzacher, Horst Bischof, and Bernhard Rinner In Proc. Int'l Conf. on Distributed Smart Cameras, 2008

Copyright 2010 ICG