Abstract
Next generation machine learning requires stepping away from classical batch learning towards interactive and adaptive learning. This is essential to cope with demanding machine learning applications we have already today. Our workshop at ECCV 2018 in Munich therefore served as a discussion forum for experts in this field and in the following we give a brief overview. Please note that this discussion paper has not been not peer-reviewed and only contains the subjective summary of the workshop organizers.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Scope of the Workshop
Learning algorithms are the backbone of computer vision research and still focused on training from large amounts of already annotated data. The limitations we are currently observing in many applications are mostly due to the lack of annotations or changing data distributions over time. To overcome these barriers, the annotation and learning of models needs to be coupled strongly through human-machine interaction. Furthermore, models need to adapt as needed to handle either shifts or completely novel data. The goal of this workshop was to discuss and present the advances in technologies that support annotation, model learning through expert guidance, and continuous model adaptation.
The interactive and adaptive learning (IAL) workshop tried to bridge one of the gaps between results of basic AI research and their real-world applicability: availability of useful and easy-to-produce annotations and working solutions for efficient model adaptation. In consequence, the following topics have been central to the workshop:
-
Online and incremental learning
-
Interactive segmentation and detection to support annotation
-
Transfer learning
-
Active or self-taught
-
Continuous/lifelong learning
-
Open set learning
-
Open domain learning
-
Efficient fine-tuning of generic models.
The topics are often seen as separated research fields, however, they should be considered jointly. While preparing this workshop, we once phrased this area as Machine Didactics, referring to the fact that we need to improve not only the training but also the teaching of models, which includes the way we collect and annotate data. For a lot of applications, it is simply unreasonable to assume that there is a clear division between annotation phase and the phase where a model is trained and tested with the data. In practice, this is always a continuous cycle of improvement and challenging the annotators with more data and further requirements. Currently, this process is still driven by manual work both from machine learning engineers as well as domain experts. The basic question for the future would be how this can be assisted by proper algorithms as well, such as active learning algorithms choosing the examples to annotate or bootstrapped feedback loops that allow experts to tune and check annotations rather than creating them from scratch with a lot of effort.
Another important aspect of the workshop stems from the fact that there are often unclear requirements of machine learning algorithms in the beginning. In reality, the number of classes that need to be differentiated in a classification task is simply often not defined or is likely to increase and change over time. This is referred to as an open world situation and is far more challenging than the standard ImageNet-like competition task most researchers focus on today. We were therefore very happy to have an associated challenge on open-set face recognition organized by Terry and Walter that was presented in detail during the workshop.
In addition to the aforementioned risks, a real-world machine learning application is likely to face changes of input conditions resulting from changing a sensor or the application field. Dealing with this problem requires training from only a few examples by transfer learning or learning of generic representations that allow for jump-starting learning for various tasks.
We invited extended abstract submissions related to the workshop scope and also compiled the list of invited speakers according to the fit of their main research interests to the workshop idea.
2 Invited Speakers
2.1 Incremental Learning: A Critical View on the Current State of Affairs (Tinne Tuytelaars)
In the first invited talk of the workshop, Tinne Tuytelaars (KU Leuven) gave an overview on recent developments in the field of incremental learning. She highlighted current scenarios of incremental learning and argued that the majority of existing approaches is hardly comparable due to not-matching assumptions on the availability of tasks and data over time. She presented several approaches from her recent work [1,2,3,4] which tackle this issue and address the problem of catastrophic forgetting, e.g., by encouraging sparse representations to leave model capacity for subsequent tasks that are added over time.
2.2 Results and Evaluation of the Open-Face Challenge (Manuel Günther)
Manuel Günther (UCCD) presented the UnConstrained College Students (UCCS) dataset which is an Open-Face Challenge [5]. Subjects are photographed using a long-range high-resolution surveillance camera. Faces inside these images are of various poses, and varied levels of blurriness and occlusion. The challenge comes with a closed set recognition problem as well as an open set recognition problem. In addition, different attack scenarios are evaluated. More information about the challenge, the data, terms of usage, and recent results can be found on the challenge’s webpage at http://vast.uccs.edu/Opensetface/.
During the discussion, the spent effort and the availability of the dataset was positively acknowledged. All participants further agreed on the difficulty of the task of re-identifying individuals based on single sub-images. Nonetheless, issues have been raised why the re-identification task is posed on single images, whereas the ground truth to validate the ids required entire video clips (which would also likely be the final application scenario).
2.3 Recognition with Unseen Compositions and Novel Environments (Kristen Grauman)
Kristen Grauman (UT Austin) put emphasis on two aspects of open-ended learning: how to recognize unseen compositions of objects and operators as well as how to operate and navigate in unseen environments.
In her recent work [6], Kristen and her team show how operations such as slicing an apple, i.e., operations which transform objects, can be modeled as object-operator pairs and can be realized as operators applied to object representations. Appropriate embeddings are learned by optimizing a triplet-loss and additionally adding semantic regularizers, e.g., enforcing operators to be invertible which resembles undoing a transformation. In consequence, the notion of operators can also be generalized to new compositions of operator-object-pairs.
In the second part of her talk, Kristen focused on self-learning agents which are faced with environments that have been unseen at training time [7]. Based on a reinforcement learning approach, they proposed an additional reward for actions which reduce the estimated uncertainty about the agent’s environment. An interesting future direction is to combine this unsupervised exploration with active look-ahead strategies [8].
2.4 Interactive Video Segmentation: The DAVIS Benchmark and First Approaches (Jordi Pont-Tuset)
Jordi Pont-Tuset (Google AI) gave an overview of his work on video segmentation. In particular, he presented the DAVIS benchmark [9] and the video segmentation approach published in [10]. The latter only requires the annotation of a few key frames and allows propagating the region segmentation to the whole video. This work is one example for the focus of the workshop on reducing annotation efforts by interactive segmentation and in general assisting the annotator by propagating annotations in an intelligent manner. Especially for pixel-wise video segmentation, fully manual annotation often renders intractable. One of the key ideas of the underlying algorithm is to perform metric learning to phrase the segmentation as a retrieval problem on the pixel level later on.
2.5 Towards Continual Learning and Interactive Annotation (Christoph Lampert)
Christoph Lampert (IST Austria) presented recent results in the area of lifelong learning and interactive annotation. In the first part of this talk, he reviewed iCARL, the Incremental Classifier and Representation Learning [11], which jointly learns appropriate embeddings and classification models upon the presence of newly added data. In continuous learning scenarios, it is further possible that unlabeled data is available and individuals tasks can be selected for annotation. How to select tasks such that information can be optimally transferred was shown in [12]. To assist in the annotation of new data, learnable bounding box dialogs for interactive annotation were presented in [13]. Finally, his work in [14] shows a simple yet powerful statistic test to detect if an incoming stream of data deviates from data a model has been trained on. By comparing distributions of model confidence scores, e.g., the maximum class score of deep convnets, the KS-test yields a probability if an entire batch of test samples stems from a different data distribution, e.g., induced by sensor drifts.
2.6 Elements of Continuous Learning for Wildlife Monitoring (Joachim Denzler)
Joachim Denzler (Univ. Jena) presented recent advances in continuous learning, especially focusing on active learning and anomaly detection. With the contributions of his group, he showed how application experts can be assisted in analyzing large-scale data using interactive machine learning tools, e.g., by spotting abnormal instances [15,16,17], interactively learning object classifiers [18, 19], regression models for animal age [20], or object detectors [21], and classifying large data collections from camera traps in a semi-automated fashion [22,23,24]. In summary, the recent tools and techniques already add large value to the application scientist’s work. Nonetheless, reliable and efficient interactive learning with deep neural networks remains an unsolved problem.
3 Extended Abstracts
Neal et al. | Open set learning with counterfactual images |
---|---|
Günther et al. | Open-set recognition challenge |
Busto et al. | Open set domain adaptation for image and action recognition |
Dwivedi and Roig | Evaluation of plug and play modules for multi-domain learning |
Jin et al. | Unsupervised hard example mining from videos for improved object detection |
Osep et al. | Towards large-scale video object mining |
Wang and Sharma | Unsupervised representation learning on multispectral imagery by predicting held-out bands |
Sharma and Wang | Human-in-the-loop segmentation for improved segmentation and annotations |
Bauermeister et al. | Adaptive network architectures via linear splines |
Rakelly et al. | Few-shot segmentation propagation with guided networks |
4 Summary and Next Steps
The workshop successfully served as a venue for exchanging recent trends in the field of interactive and adaptive learning in an open world. The combination of invited speakers covering a broad technical spectrum as well as a short and informal poster session allowed for detailed discussions and for fostering connections.
The audience raised the strong interest in continuing the workshop within the next years. Of great benefit would be the continuation of a co-located challenge, especially in the area of open-set recognition.
References
Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: CVPR, pp. 7120–7129 (2017)
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 144–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_9
Rannen, A., Aljundi, R., Blaschko, M.B., Tuytelaars, T.: Encoder based lifelong learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Aljundi, R., Rohrbach, M., Tuytelaars, T.: Selfless sequential learning. arXiv preprint arXiv:1806.05421 (2017)
Günther, M., Cruz, S., Rudd, E.M., Boult, T.E.: Toward open-set face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW, pp. 573–582. IEEE (2017)
Nagarajan, T., Grauman, K.: Attributes as operators: factorizing unseen attribute-object compositions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 172–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_11
Jayaraman, D., Grauman, K.: Learning to look around: intelligently exploring unseen environments for unknown tasks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Ramakrishnan, S.K., Grauman, K.: Sidekick policy learning for active visual exploration. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 424–442. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_26
Caelles, S., et al.: The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557 (2018)
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1189–1198 (2018)
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)
Pentina, A., Lampert, C.H.: Multi-task learning with labeled and unlabeled tasks. In: International Conference on Machine Learning, ICML (2017)
Konyushkova, K., Uijlings, J., Lampert, C.H., Ferrari, V.: Learning intelligent dialogs for bounding box annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Sun, R., Lampert, C.H.: KS(conf): a light-weight test if a ConvNet operates outside of its specifications. In: German Conference on Pattern Recognition, GCPR (2018)
Barz, B., Rodner, E., Garcia, Y.G., Denzler, J.: Detecting regions of maximal divergence for spatio-temporal anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Garcia, Y.G., Shadaydeh, M., Mahecha, M., Denzler, J.: Extreme anomaly event detection in biosphere using linear regression and a spatiotemporal MRF model. Nat. Hazards, 1–19 (2018)
Schultheiss, A., Käding, C., Freytag, A., Denzler, J.: Finding the unknown: novelty detection with extreme value signatures of deep neural activations. In: Roth, V., Vetter, T. (eds.) GCPR 2017. LNCS, vol. 10496, pp. 226–238. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66709-6_19
Käding, C., Freytag, A., Rodner, E., Bodesheim, P., Denzler, J.: Active learning and discovery of object categories in the presence of unnameable instances. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4343–4352 (2015)
Käding, C., Rodner, E., Freytag, A., Denzler, J.: Watch, ask, learn, and improve: a lifelong learning cycle for visual recognition. In: European Symposium on Artificial Neural Networks, ESANN, pp. 381–386 (2016)
Käding, C., Rodner, E., Freytag, A., Mothes, O., Barz, B., Denzler, J.: Active learning for regression tasks with expected model output changes. In: British Machine Vision Conference, BMVC (2018)
Brust, C.A., Käding, C., Denzler, J.: Active learning for deep object detection. In: arXiv preprint arXiv:1809.09875 (2018)
Körschens, M., Barz, B., Denzler, J.: Towards automatic identification of elephants in the wild. In: AI for Wildlife Conservation Workshop, AIWC (2018)
Brust, C.A., et al.: Towards automated visual monitoring of individual gorillas in the wild. In: ICCV Workshop on Visual Wildlife Monitoring, ICCV-WS, pp. 2820–2830 (2017)
Freytag, A., Rodner, E., Simon, M., Loos, A., Kühl, H.S., Denzler, J.: Chimpanzee faces in the wild: log-euclidean CNNs for predicting identities and attributes of primates. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 51–63. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45886-1_5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Freytag, A. et al. (2019). Workshop on Interactive and Adaptive Learning in an Open World. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11134. Springer, Cham. https://doi.org/10.1007/978-3-030-11024-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-11024-6_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11023-9
Online ISBN: 978-3-030-11024-6
eBook Packages: Computer ScienceComputer Science (R0)