Detecting overlapping instances in microscopy images using extremal region trees

doi:10.1016/j.media.2015.03.002

Medical Image Analysis

Volume 27, January 2016, Pages 3-16

https://doi.org/10.1016/j.media.2015.03.002 Get rights and content

Highlights

•
Detection of overlapping instances in microscopy with separate classes for tuples of objects.
•
A convex formulation is driven by a novel Instance-Count Loss.
•
Extremal region trees are used as candidates for instance detection.
•
A surface that optimizes the pool of candidate regions is derived from microscopy images.
•
The detection model can be trained from simple dot-annotations.

Abstract

In many microscopy applications the images may contain both regions of low and high cell densities corresponding to different tissues or colonies at different stages of growth. This poses a challenge to most previously developed automated cell detection and counting methods, which are designed to handle either the low-density scenario (through cell detection) or the high-density scenario (through density estimation or texture analysis).

The objective of this work is to detect all the instances of an object of interest in microscopy images. The instances may be partially overlapping and clustered. To this end we introduce a tree-structured discrete graphical model that is used to select and label a set of non-overlapping regions in the image by a global optimization of a classification score. Each region is labeled with the number of instances it contains – for example regions can be selected that contain two or three object instances, by defining separate classes for tuples of objects in the detection process.

We show that this formulation can be learned within the structured output SVM framework and that the inference in such a model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The candidate regions for the selection are obtained as extremal region of a surface computed from the microscopy image, and we show that the performance of the model can be improved by considering a proxy problem for learning the surface that allows better selection of the extremal regions. Furthermore, we consider a number of variations for the loss function used in the structured output learning.

The model is applied and evaluated over six quite disparate data sets of images covering: fluorescence microscopy, weak-fluorescence molecular images, phase contrast microscopy and histopathology images, and is shown to exceed the state of the art in performance.

Graphical abstract

Introduction

Automatic detection of objects (e.g. cell colonies, individual cells or nuclei) in microscopy images plays a crucial role in the analysis of microscopy-based experiments within a wide variety of microscopy applications for clinical and commercial settings. On its own, detection is able to determine the presence (and quantity) of an object of interest, such as cancer cells in a pathology image, but furthermore, it can also be the starting point for other objectives such as object segmentation or tracking. Among the challenges that characterize object detection in microscopy images, one that stands out is the necessity to deal with the presence of a large number of objects, often partially overlapping.

In many microscopy imaging modalities, objects of interest can often be identified as bright or dark blobs in one of the image channels. Such blobs correspond to extremal regions (Matas et al., 2004), and a natural approach to detection and understanding such images is (a) to consider the set of all extremal regions and (b) to identify those extremal regions that actually correspond to objects of interest. This is the approach that we pursue in this work. Several key challenges need to be addressed to make this approach successful, namely:

•
Each object of interest typically corresponds to multiple, very similar and overlapping, extremal regions. The challenge then is to pick a subset of regions corresponding to objects of interest, so that each object of interest is represented by only one region. We show how this can be achieved via organizing extremal regions into a tree-shaped (or forest-shaped) discrete graphical model with binary labels. Message propagation (dynamic programming) in such a tree (forest) then produces a desired subset of non-overlapping extremal regions corresponding to objects.
•
In more challenging images, it is often the case that groups of tightly overlapping objects (i.e. cells in a dense cluster) cannot be distinguished on the basis of extremal regions. In other words, for certain objects, there might not exist extremal regions that include one object but exclude others in the same group. We show that the model can be extended to handle such challenging situations. The extended model is able to identify the blobs (extremal regions) that correspond to multiple overlapping objects, and to label simultaneously the selected regions with labels that indicate the number of objects that each selected extremal region corresponds to (Fig. 1). This extension greatly widens the applicability of the approach without changing the topology of the underlying graphical model or increasing the complexity of the inference.
•
Apart from the model and the inference in the model, a key question is one of machine learning, i.e. a method to identify which extremal regions correspond to objects and which do not (and in the extended case, identify the number of objects within certain regions). We demonstrate that all this can be done in a weakly-supervised learning setting, so that the method is trained on a set of dotted representative images, where each object is annotated only by a dot placed inside of it. The training is performed using latent structured output support vector machines (Yu & Joachims, 2009) with a specially designed counting loss-function.
•
Finally, we address the task of automatically identifying a “good” image channel that contains extremal regions that are “good” for our approach. Specifically, we propose a method that automatically optimizes over a linear combination of input channels (where some input channels can actually correspond to filtered versions of other channels) to determine an input image. After such an optimization, the resulting image gives rise to extremal regions that allows the efficient identification of individual objects or small groups of overlapping objects. This procedure only requires the same dot annotated images as above.

We conduct a set of experiments with synthetic and real microscopy images and show that the proposed method achieves very good detection accuracies despite large amounts of overlap and very low effective spatial resolution. We assess the effect of the different elements of our detection system and show that the combination of them results in the highest accuracy. The resulting system outperforms other methods for instance detection in microscopy images and is comparable in counting accuracy with the methods that are trained to count (and do not perform detection). While microscopic image modalities form a natural domain for our method, the proposed approach is general and can be applied to macroscopic medical or non-medical images, as demonstrated in Arteta et al. (2013).

This paper extends the previous conference papers (Arteta, Lempitsky, Noble, Zisserman, 2012, Arteta, Lempitsky, Noble, Zisserman, 2013) that developed the initial approach. In comparison to the more recent conference version (Arteta et al., 2013), this paper adds the following extensions: (i) it develops in more detail the inference procedure on the tree-structured graphical model (Section 5); (ii) it provides further evaluation and insights into the loss function, along with a new variant of it (Section 6.1); (iii) it proposes a method for picking a linear combination of input channels that optimizes the method’s performance (Section 7); and (iv) it provides additional experiments with challenging microscopy images (Section 8).

Section snippets

Instance detection in crowded scenes

Most computer vision methods that address the task of understanding images with multiple overlapping objects fall into two classes. The first is based on individual object detection. Such detection can be based on a sliding window or the Hough transform, followed by an appropriate non-maxima suppression procedure (Barinova, Lempitsky, Kholi, 2012, Desai, Ramanan, Fowlkes, 2011, Leibe, Leonardis, Schiele, 2008), stochastic fitting of interacting particles or object models (Descombes, Minlos,

Datasets

We first introduce the datasets and metrics used to evaluate the performance of our system before describing the system itself. Six distinct microscopy datasets spanning different modalities and imaging conditions are used. For each of the datasets, the data used for training are divided into several random splits, which are later used to compute means and standard deviations of the evaluation metrics. The task, in all cases, is to detect all instances of the objects (i.e. cells) which have

Model overview

For an input image $I$ containing multiple instances of an object class (some of which may be overlapping) we want to automatically detect the instances and provide an estimate of their location. We start by generating a pool of N nested regions (see Fig. 3 for a case where N = 13), such that for each pair of regions R_i and R_j in the pool, these regions are either nested (i.e. R_i ⊂ R_j or R_i ⊃ R_j) or they do not overlap (R_i ∩ R_j = ∅). In the simplest case, a pool can comprise extremal regions of

Inference on the model

Given a set of nested candidate regions, let V_i(d) denote the classifier score of a region R_i for class d (the higher the score, the more this region looks like a typical region containing d object centroids). For notational simplicity, we also define V_i(0) = 0. We introduce the optimization variables y = {y_i|i = 1…N}, where y_i = 0 means that the region R_i is not selected, and y_i = d ∈ 1…D means that the region R_i is selected and assigned class d. We denote with $Y$ the set of all y that meet the

Learning region classifiers

The model for the evaluation of the candidate regions can be trained on weakly annotated (dotted) images and does not require more detailed annotations (e.g. bounding boxes). Thus, we assume that we are given a set of images annotated with dots, where each dot is placed inside each instance of the object. The learning is driven by an instance count loss (IC-loss) (3), denoted as L_IC, that penalizes all deviations from the one-to-one correspondences between annotation dots and the selected

Crafting a surface for extremal region computation

Collecting extremal regions as candidates for object detection from the intensity channel of microscopy images is often successful (Arteta et al., 2012), but not optimal. For example, images with high levels of noise (i.e. weak-fluorescence images – Fig. 2b), low contrast or images with highly inhomogeneous objects can break the assumption that there exist extremal regions which can approximately represent each of the objects of interest or even a weaker assumption that extremal regions

Experiments and results

We now evaluate the performance of the model on the datasets described in Section 3. Within these experiments, the full system refers to the method described in this paper using the penalization function Δ^g (see Section 6.1) and the surface learning (see Section 7). The usage of the optimized surface, however, is determined via cross validation. Thus, the surface is not enabled for datasets where it is not beneficial. Instead, extremal regions are collected directly from the intensity channel

Summary and conclusion

We have presented a method for object detection in microscopy images (extending (Arteta, Lempitsky, Noble, Zisserman, 2012, Arteta, Lempitsky, Noble, Zisserman, 2013)) which is particularly suitable for images with multiple overlapping instances of an object. Depending on the difficulty of the detection task, the model has the flexibility to choose to detect overlapping objects in groups containing a variable number of instances, as well as individual instances if the task is easy. Such ability

Acknowledgments

We acknowledge the researchers of the Laboratory for Viral RNA Biochemistry, Institute of Protein Research RAS, for providing the images and annotations of gels with molecular colonies, Dr. Svetlana Uzbekova (INRA, Physiology of Reproduction and Behavior Unit, Nouzilly, France) for providing the images of fluorescent nuclear stained bovine blastocysts, Dr. Boris Vojnovic and James Thompson (Grey Institute for Radiation Oncology and Biology, University of Oxford, UK) for proving the equipment

References (35)

BernardisE. et al.
Pop out many small structures from a very large microscopic image
Med. Image Anal.
(2011)
MatasJ. et al.
Robust wide-baseline stereo from maximally stable extremal regions
Image Vis. Comput.
(2004)
SamatovT.R. et al.
Real-time monitoring of DNA colonies growing in a polyacrylamide gel
Analytical Biochemistry
(2006)
Al-KofahiY. et al.
Improved automatic detection and segmentation of cell nuclei in histopathology images
IEEE Trans. Biomed. Eng.
(2010)
ArtetaC. et al.
Learning to detect cells using non-overlapping extremal regions
ArtetaC. et al.
Learning to detect partially overlapping instances
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference
(2013)
BarinovaO. et al.
On detection of multiple object instances using Hough transforms
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
BarinovaO. et al.
On detection of multiple object instances using hough transforms
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
ChanA.B. et al.
Bayesian poisson regression for crowd counting
Computer Vision, 2009 IEEE 12th International Conference on
(2009)
Chetverin, A. B., Chetverina, H. V., 1997. Method for amplification of nucleic acids in solid media. US Patent...

DesaiC. et al.

Discriminative models for multi-class object layout

Int. J. Comput. Vision

(2011)

DescombesX. et al.

Object extraction using a stochastic birth-and-death dynamics in continuum

J. Math. Imaging Vis.

(2009)

DongL. et al.

Fast crowd segmentation using shape indexing

Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference

(2007)

DonoserM. et al.

3D segmentation by maximally stable volumes (MSVS)

Pattern Recognition, 2006. ICPR 2006. 18th International Conference

(2006)

FiaschiL. et al.

Learning to count with regression forest and structured labels

Pattern Recognition (ICPR), 2012 21st International Conference

(2012)

GurcanM.N. et al.

Pattern recognition in histopathological images: An icpr 2010 contest

Recognizing Patterns in Signals, Speech, Images and Videos

(2010)

KongD. et al.

A viewpoint invariant approach for crowd counting

Pattern Recognition, 2006. ICPR 2006. 18th International Conference

(2006)

Cited by (82)

MACnet: Mask augmented counting network for class-agnostic counting
2023, Pattern Recognition Letters
Class agnostic counting involves counting the instances of any user-defined class. It is also usually phrased as a matching problem wherein the model finds all the matching objects in a query image given exemplar patches containing the target object. Typically, users define exemplar patches by placing bounding boxes around the target object. However, defining exemplars using bounding boxes inevitably captures both the target object (foreground) and its surrounding background. This would unintentionally match patches similar to the background, leading to an inaccurate count. Moreover, objects poorly represented by a bounding box (e.g., non-axis aligned, irregular, or non-rectangular shapes) may capture a significantly disproportionate amount of background relative to the foreground within the exemplar patch, leading to poor matches. In this paper, we propose to utilize segmentation masks to separate target objects from their background. We derived these segmentation masks from extreme points, which requires no additional annotation effort from the user compared to annotating bounding boxes. Moreover, we designed a module that learns the mask features as residual to the object features, allowing the network to learn how to better incorporate the segmentation masks. Our model improves upon state-of-the-art methods by up to 3.7 MAE points on the FSC-147 benchmark dataset, showing the effectiveness of our masking approach.
Learning to count biological structures with raters’ uncertainty
2022, Medical Image Analysis
Citation Excerpt :
As such, both detection-based and regression-based methods have been proposed. In the former category, Arteta et al. (2016a) introduced a tree-structured discrete graphical model exploited to select and label a set of non-overlapping regions in the image by global optimization of a classification score. More recently, Paulauskaite-Taraseviciene et al. (2019) exploited the Mask R-CNN instance segmentation framework (He et al., 2020) to detect overlapping cells, whereas Dou et al. (2017) used a CNN to segment biological structures from 3D medical images.
Exploiting well-labeled training sets has led deep learning models to astonishing results for counting biological structures in microscopy images. However, dealing with weak multi-rater annotations, i.e., when multiple human raters disagree due to non-trivial patterns, remains a relatively unexplored problem. More reliable labels can be obtained by aggregating and averaging the decisions given by several raters to the same data. Still, the scale of the counting task and the limited budget for labeling prohibit this. As a result, making the most with small quantities of multi-rater data is crucial. To this end, we propose a two-stage counting strategy in a weakly labeled data scenario. First, we detect and count the biological structures; then, in the second step, we refine the predictions, increasing the correlation between the scores assigned to the samples and the raters’ agreement on the annotations. We assess our methodology on a novel dataset comprising fluorescence microscopy images of mice brains containing extracellular matrix aggregates named perineuronal nets. We demonstrate that we significantly enhance counting performance, improving confidence calibration by taking advantage of the redundant information characterizing the small sets of available multi-rater data.
Counting Objects by Diffused Index: Geometry-free and training-free approach
2022, Journal of Visual Communication and Image Representation
Citation Excerpt :
These images have a low percentage of overlapping cells where cells are separated by the bright edge boundaries. The results obtained by CODI-M and CODI-S methods are compared to Class Agnostic method [4] and Singletons [45] methods. In [45], a tree-structured discrete graphical model is used to classify non-overlapping regions by optimizing of a classification score.
Counting objects is a fundamental but challenging problem. In this paper, we propose diffusion-based, geometry-free, and learning-free methodologies to count the number of objects in images. The main idea is to represent each object by a unique index value regardless of its intensity or size, and to simply count the number of index values. First, we place different vectors, refer to as seed vectors, uniformly throughout the mask image. The mask image has boundary information of the objects to be counted. Secondly, the seeds are diffused using an edge-weighted harmonic variational optimization model within each object. We propose an efficient algorithm based on an operator splitting approach and alternating direction minimization method, and theoretical analysis of this algorithm is given. An optimal solution of the model is obtained when the distributed seeds are completely diffused such that there is a unique intensity within each object, which we refer to as an index. For computational efficiency, we stop the diffusion process before a full convergence, and propose to cluster these diffused index values. We refer to this approach as Counting Objects by Diffused Index (CODI). We explore scalar and multi-dimensional seed vectors. For Scalar seeds, we use Gaussian fitting in histogram to count, while for vector seeds, we exploit a high-dimensional clustering method for the final step of counting via clustering. The proposed method is flexible even if the boundary of the object is not clear nor fully enclosed. We present counting results in various applications such as biological cells, agriculture, concert crowd, and transportation. Some comparisons with existing methods are presented.
Cell counting via attentive recognition network
2024, International Journal of Computational Science and Engineering
A New Method for Microscopy Image Segmentation Using Multi-scale Line Detection
2024, Communications in Computer and Information Science
Using optimal transport theory to optimize a deep convolutional neural network microscopic cell counting method
2023, Medical and Biological Engineering and Computing

View all citing articles on Scopus

View full text

Detecting overlapping instances in microscopy images using extremal region trees

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Instance detection in crowded scenes

Datasets

Model overview

Inference on the model

Learning region classifiers

Crafting a surface for extremal region computation

Experiments and results

Summary and conclusion

Acknowledgments

Med. Image Anal.

Image Vis. Comput.

Analytical Biochemistry

Improved automatic detection and segmentation of cell nuclei in histopathology images

IEEE Trans. Biomed. Eng.

Learning to detect cells using non-overlapping extremal regions

Learning to detect partially overlapping instances

Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference

On detection of multiple object instances using Hough transforms

IEEE Trans. Pattern Anal. Mach. Intell.

On detection of multiple object instances using hough transforms

IEEE Trans. Pattern Anal. Mach. Intell.

Bayesian poisson regression for crowd counting

Computer Vision, 2009 IEEE 12th International Conference on

Discriminative models for multi-class object layout

Int. J. Comput. Vision

Object extraction using a stochastic birth-and-death dynamics in continuum

J. Math. Imaging Vis.

Fast crowd segmentation using shape indexing

Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference

3D segmentation by maximally stable volumes (MSVS)

Pattern Recognition, 2006. ICPR 2006. 18th International Conference

Learning to count with regression forest and structured labels

Pattern Recognition (ICPR), 2012 21st International Conference

Pattern recognition in histopathological images: An icpr 2010 contest

Recognizing Patterns in Signals, Speech, Images and Videos

A viewpoint invariant approach for crowd counting

Pattern Recognition, 2006. ICPR 2006. 18th International Conference