Detecting overlapping instances in microscopy images using extremal region trees
Graphical abstract
Introduction
Automatic detection of objects (e.g. cell colonies, individual cells or nuclei) in microscopy images plays a crucial role in the analysis of microscopy-based experiments within a wide variety of microscopy applications for clinical and commercial settings. On its own, detection is able to determine the presence (and quantity) of an object of interest, such as cancer cells in a pathology image, but furthermore, it can also be the starting point for other objectives such as object segmentation or tracking. Among the challenges that characterize object detection in microscopy images, one that stands out is the necessity to deal with the presence of a large number of objects, often partially overlapping.
In many microscopy imaging modalities, objects of interest can often be identified as bright or dark blobs in one of the image channels. Such blobs correspond to extremal regions (Matas et al., 2004), and a natural approach to detection and understanding such images is (a) to consider the set of all extremal regions and (b) to identify those extremal regions that actually correspond to objects of interest. This is the approach that we pursue in this work. Several key challenges need to be addressed to make this approach successful, namely:
- •
Each object of interest typically corresponds to multiple, very similar and overlapping, extremal regions. The challenge then is to pick a subset of regions corresponding to objects of interest, so that each object of interest is represented by only one region. We show how this can be achieved via organizing extremal regions into a tree-shaped (or forest-shaped) discrete graphical model with binary labels. Message propagation (dynamic programming) in such a tree (forest) then produces a desired subset of non-overlapping extremal regions corresponding to objects.
- •
In more challenging images, it is often the case that groups of tightly overlapping objects (i.e. cells in a dense cluster) cannot be distinguished on the basis of extremal regions. In other words, for certain objects, there might not exist extremal regions that include one object but exclude others in the same group. We show that the model can be extended to handle such challenging situations. The extended model is able to identify the blobs (extremal regions) that correspond to multiple overlapping objects, and to label simultaneously the selected regions with labels that indicate the number of objects that each selected extremal region corresponds to (Fig. 1). This extension greatly widens the applicability of the approach without changing the topology of the underlying graphical model or increasing the complexity of the inference.
- •
Apart from the model and the inference in the model, a key question is one of machine learning, i.e. a method to identify which extremal regions correspond to objects and which do not (and in the extended case, identify the number of objects within certain regions). We demonstrate that all this can be done in a weakly-supervised learning setting, so that the method is trained on a set of dotted representative images, where each object is annotated only by a dot placed inside of it. The training is performed using latent structured output support vector machines (Yu & Joachims, 2009) with a specially designed counting loss-function.
- •
Finally, we address the task of automatically identifying a “good” image channel that contains extremal regions that are “good” for our approach. Specifically, we propose a method that automatically optimizes over a linear combination of input channels (where some input channels can actually correspond to filtered versions of other channels) to determine an input image. After such an optimization, the resulting image gives rise to extremal regions that allows the efficient identification of individual objects or small groups of overlapping objects. This procedure only requires the same dot annotated images as above.
We conduct a set of experiments with synthetic and real microscopy images and show that the proposed method achieves very good detection accuracies despite large amounts of overlap and very low effective spatial resolution. We assess the effect of the different elements of our detection system and show that the combination of them results in the highest accuracy. The resulting system outperforms other methods for instance detection in microscopy images and is comparable in counting accuracy with the methods that are trained to count (and do not perform detection). While microscopic image modalities form a natural domain for our method, the proposed approach is general and can be applied to macroscopic medical or non-medical images, as demonstrated in Arteta et al. (2013).
This paper extends the previous conference papers (Arteta, Lempitsky, Noble, Zisserman, 2012, Arteta, Lempitsky, Noble, Zisserman, 2013) that developed the initial approach. In comparison to the more recent conference version (Arteta et al., 2013), this paper adds the following extensions: (i) it develops in more detail the inference procedure on the tree-structured graphical model (Section 5); (ii) it provides further evaluation and insights into the loss function, along with a new variant of it (Section 6.1); (iii) it proposes a method for picking a linear combination of input channels that optimizes the method’s performance (Section 7); and (iv) it provides additional experiments with challenging microscopy images (Section 8).
Section snippets
Instance detection in crowded scenes
Most computer vision methods that address the task of understanding images with multiple overlapping objects fall into two classes. The first is based on individual object detection. Such detection can be based on a sliding window or the Hough transform, followed by an appropriate non-maxima suppression procedure (Barinova, Lempitsky, Kholi, 2012, Desai, Ramanan, Fowlkes, 2011, Leibe, Leonardis, Schiele, 2008), stochastic fitting of interacting particles or object models (Descombes, Minlos,
Datasets
We first introduce the datasets and metrics used to evaluate the performance of our system before describing the system itself. Six distinct microscopy datasets spanning different modalities and imaging conditions are used. For each of the datasets, the data used for training are divided into several random splits, which are later used to compute means and standard deviations of the evaluation metrics. The task, in all cases, is to detect all instances of the objects (i.e. cells) which have
Model overview
For an input image containing multiple instances of an object class (some of which may be overlapping) we want to automatically detect the instances and provide an estimate of their location. We start by generating a pool of N nested regions (see Fig. 3 for a case where N = 13), such that for each pair of regions Ri and Rj in the pool, these regions are either nested (i.e. Ri ⊂ Rj or Ri ⊃ Rj) or they do not overlap (Ri ∩ Rj = ∅). In the simplest case, a pool can comprise extremal regions of
Inference on the model
Given a set of nested candidate regions, let Vi(d) denote the classifier score of a region Ri for class d (the higher the score, the more this region looks like a typical region containing d object centroids). For notational simplicity, we also define Vi(0) = 0. We introduce the optimization variables y = {yi|i = 1…N}, where yi = 0 means that the region Ri is not selected, and yi = d ∈ 1…D means that the region Ri is selected and assigned class d. We denote with the set of all y that meet the
Learning region classifiers
The model for the evaluation of the candidate regions can be trained on weakly annotated (dotted) images and does not require more detailed annotations (e.g. bounding boxes). Thus, we assume that we are given a set of images annotated with dots, where each dot is placed inside each instance of the object. The learning is driven by an instance count loss (IC-loss) (3), denoted as LIC, that penalizes all deviations from the one-to-one correspondences between annotation dots and the selected
Crafting a surface for extremal region computation
Collecting extremal regions as candidates for object detection from the intensity channel of microscopy images is often successful (Arteta et al., 2012), but not optimal. For example, images with high levels of noise (i.e. weak-fluorescence images – Fig. 2b), low contrast or images with highly inhomogeneous objects can break the assumption that there exist extremal regions which can approximately represent each of the objects of interest or even a weaker assumption that extremal regions
Experiments and results
We now evaluate the performance of the model on the datasets described in Section 3. Within these experiments, the full system refers to the method described in this paper using the penalization function Δg (see Section 6.1) and the surface learning (see Section 7). The usage of the optimized surface, however, is determined via cross validation. Thus, the surface is not enabled for datasets where it is not beneficial. Instead, extremal regions are collected directly from the intensity channel
Summary and conclusion
We have presented a method for object detection in microscopy images (extending (Arteta, Lempitsky, Noble, Zisserman, 2012, Arteta, Lempitsky, Noble, Zisserman, 2013)) which is particularly suitable for images with multiple overlapping instances of an object. Depending on the difficulty of the detection task, the model has the flexibility to choose to detect overlapping objects in groups containing a variable number of instances, as well as individual instances if the task is easy. Such ability
Acknowledgments
We acknowledge the researchers of the Laboratory for Viral RNA Biochemistry, Institute of Protein Research RAS, for providing the images and annotations of gels with molecular colonies, Dr. Svetlana Uzbekova (INRA, Physiology of Reproduction and Behavior Unit, Nouzilly, France) for providing the images of fluorescent nuclear stained bovine blastocysts, Dr. Boris Vojnovic and James Thompson (Grey Institute for Radiation Oncology and Biology, University of Oxford, UK) for proving the equipment
References (35)
- et al.
Pop out many small structures from a very large microscopic image
Med. Image Anal.
(2011) - et al.
Robust wide-baseline stereo from maximally stable extremal regions
Image Vis. Comput.
(2004) - et al.
Real-time monitoring of DNA colonies growing in a polyacrylamide gel
Analytical Biochemistry
(2006) - et al.
Improved automatic detection and segmentation of cell nuclei in histopathology images
IEEE Trans. Biomed. Eng.
(2010) - et al.
Learning to detect cells using non-overlapping extremal regions
- et al.
Learning to detect partially overlapping instances
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference
(2013) - et al.
On detection of multiple object instances using Hough transforms
IEEE Trans. Pattern Anal. Mach. Intell.
(2012) - et al.
On detection of multiple object instances using hough transforms
IEEE Trans. Pattern Anal. Mach. Intell.
(2012) - et al.
Bayesian poisson regression for crowd counting
Computer Vision, 2009 IEEE 12th International Conference on
(2009) - Chetverin, A. B., Chetverina, H. V., 1997. Method for amplification of nucleic acids in solid media. US Patent...
Discriminative models for multi-class object layout
Int. J. Comput. Vision
Object extraction using a stochastic birth-and-death dynamics in continuum
J. Math. Imaging Vis.
Fast crowd segmentation using shape indexing
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference
3D segmentation by maximally stable volumes (MSVS)
Pattern Recognition, 2006. ICPR 2006. 18th International Conference
Learning to count with regression forest and structured labels
Pattern Recognition (ICPR), 2012 21st International Conference
Pattern recognition in histopathological images: An icpr 2010 contest
Recognizing Patterns in Signals, Speech, Images and Videos
A viewpoint invariant approach for crowd counting
Pattern Recognition, 2006. ICPR 2006. 18th International Conference
Cited by (82)
MACnet: Mask augmented counting network for class-agnostic counting
2023, Pattern Recognition LettersLearning to count biological structures with raters’ uncertainty
2022, Medical Image AnalysisCitation Excerpt :As such, both detection-based and regression-based methods have been proposed. In the former category, Arteta et al. (2016a) introduced a tree-structured discrete graphical model exploited to select and label a set of non-overlapping regions in the image by global optimization of a classification score. More recently, Paulauskaite-Taraseviciene et al. (2019) exploited the Mask R-CNN instance segmentation framework (He et al., 2020) to detect overlapping cells, whereas Dou et al. (2017) used a CNN to segment biological structures from 3D medical images.
Counting Objects by Diffused Index: Geometry-free and training-free approach
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :These images have a low percentage of overlapping cells where cells are separated by the bright edge boundaries. The results obtained by CODI-M and CODI-S methods are compared to Class Agnostic method [4] and Singletons [45] methods. In [45], a tree-structured discrete graphical model is used to classify non-overlapping regions by optimizing of a classification score.
Cell counting via attentive recognition network
2024, International Journal of Computational Science and EngineeringA New Method for Microscopy Image Segmentation Using Multi-scale Line Detection
2024, Communications in Computer and Information ScienceUsing optimal transport theory to optimize a deep convolutional neural network microscopic cell counting method
2023, Medical and Biological Engineering and Computing