1 Introduction

Hierarchical image segmentation provides a multi-scale approach to image analysis. Hierarchical image analysis was pioneered by [11] and has received a lot of attention since then, as attested by the popularity of [1]. Mathematical morphology has been used in hierarchical image analysis with, e.g., hierarchical watersheds [3, 12], binary partition trees [14], quasi-flat zones hierarchies [10], and tree-based shape spaces [17]. Other methods for hierarchical image analysis consider regular and irregular pyramids [9], scale-set theory [7], multiscale combinatorial grouping [13] and series of optimization problems [16].

A hierarchical image segmentation is a series of image segmentations at different detail levels where the segmentations at higher detail levels are produced by merging regions from segmentations at finer detail levels. Consequently, the regions at finer detail levels are nested in regions at coarser levels. The level of a segmentation in a hierarchy is also called an observation scale. In [8], Guimarães et al. proposed a hierarchical graph based image segmentation (HGB) method based on the Felzenszwalb-Huttenlocher dissimilarity measure. The HGB method computes, for each edge of a graph, the minimum observation scale in a hierarchy at which two regions linked by this edge should merge according to the dissimilarity.

In this article, we provide a formal definition of the criterion which is implicitly used in the HGB method. Then, we show that this criterion is not increasing with respect to the observation scales. An important consequence of this observation is that selecting the minimum observation scale for which the criterion holds true, as done with the original HGB method, is not the unique strategy that makes sense with respect to practical needs. Hence, following a recent trend of mathematical morphology (see, e.g., [17]) to study non-increasing criteria on a hierarchy, we investigate scale selection strategies, leading to new variations of the original HGB method. The proposed methods are assessed with the evaluation framework of [1]. The assessment shows that some of the proposed variations significantly outperform the original HGB method (see illustration in Fig. 1).

Fig. 1.
figure 1

Saliency maps resulting from the HGB method using the original observation scale (middle) and from one of our proposed observation scale (right).

Section 2 presents the basic notions for the HGB method. Section 3 discusses the non-increasing property of the criterion used by the HGB method and introduces an algorithm to find all the scales (associated to a given edge) which satisfy the criterion. Section 4 presents a series of strategies to select different observation scales based on this non-increasing criterion and Sect. 5 provide the comparative analysis between the proposed strategies and the original HGB method.

2 Hierarchical Graph-Based Image Segmentation

This section aims at explaining the method of hierarchical graph-based image segmentation (HGB) [8]. We first give a series of necessary notions such as quasi-flat zones hierarchies [10], and then describe the HGB method.

2.1 Basic Notions

Hierarchies. Given a finite set V, a partition of V is a set \(\mathbf {P}\) of nonempty disjoint subsets of V whose union is V. Any element of \(\mathbf {P}\) is called a region of \(\mathbf {P}\). Given two partitions \(\mathbf {P}\) and \(\mathbf {P}^\prime \) of V, \(\mathbf {P}^\prime \) is said to be a refinement of \(\mathbf {P}\), denoted by \(\mathbf {P}^{\prime } \preceq \mathbf {P}\), if any region of \(\mathbf {P}^\prime \) is included in a region of \(\mathbf {P}\). A hierarchy on V is a sequence \(\mathcal {H}= (\mathbf {P}_0, \dots , \mathbf {P}_\ell )\) of partitions of V, such that \(\mathbf {P}_{i-1} \preceq \mathbf {P}_i\), for any \(i \in \{ 1, \dots , \ell \}\).

Graph and Connected-Component Partition. A graph is a pair \(G=(V,E)\) where V is a finite set and E is a subset of \(\{ \{ x, y \} \subseteq V | x \ne y \}\). Each element of V is called a vertex of G, and each element of E is called an edge of G. A subgraph of G is a graph \((V^{\prime }, E^{\prime })\) such that \(V^{\prime } \subseteq V\) and \(E^{\prime } \subseteq E\). If X is a graph, its vertex and edge sets are denoted by V(X) and E(X), respectively.

If two vertices of a graph G are joined by an edge, we say that they are adjacent. From the reflexive–transitive closure of this adjacency relation on a finite set V(G), we derive the connectivity relation on V(X). It is an equivalence relation, whose equivalence classes are called connected components of G. We denote by \(\mathbf {C}(G)\) the set of all connected components of G. Note that \(\mathbf {C}(G)\) is a partition of V(G), called the connected-component partition induced by G.

Quasi-flat Zone Hierarchies. Given a graph \(G = (V,E)\), let w be a map from E into the set \(\mathbb {R}\) of real numbers. For any edge u of G, the value w(u) is called the weight of u (for w), and the pair (Gw) is called an edge-weighted graph. We now make from an edge-weighted graph a series of connected-component partitions, which constitutes a hierarchy. Such a hierarchy is called a quasi-flat zone hierarchy of (Gw) and the quasi-flat zone hierarchy transform is a bijection between the hierarchies and a subset of the edge weighted graphs called the saliency maps [4]. Hence, any edge-weighted graph induces a quasi-flat zone hierarchy and any hierarchy \(\mathcal {H}\) can be represented by an edge-weighted graph whose quasi-flat zone hierarchy is precisely \(\mathcal {H}\) [4]. This bijection allows us to handle quasi-flat zone hierarchies through edge-weighted graphs.

Given an edge-weighted graph (Gw), let X be a subgraph of G and let \(\lambda \) be a value of \(\mathbb {R}\). The \(\lambda \)-level edge set of X for w is defined by \(w_{\lambda }(X)=\{ u \in E(X) \mid w(u) < \lambda \}\), and the \(\lambda \)-level graph of X for w is defined as the subgraph \(w^{V}_{\lambda }(X)\) of X such that \(w^{V}_{\lambda }(X) = (V(X), w_{\lambda }(X))\). Then, the connected-component partition \(\mathbf {C}(w^{V}_{\lambda }(X))\) induced by \(w^{V}_{\lambda }(X)\) is called the \(\lambda \)-level partition of X for w.

As we consider only finite graphs and hierarchies, the set of considered level values is reduced to a finite subset of \(\mathbb {R}\) that is denoted by \(\mathbb {E}\) in the remaining parts of this article. In order to browse the values of this set and to round real values to values of \(\mathbb {E}\), we define, for any \(\lambda \in \mathbb {R}\): \(\text{ p }_{\mathbb {E}}\left( {\lambda } \right) = \max \{ \mu \in \mathbb {E} \cup \{-\infty \}\mid \mu < \lambda \}\), \(\text{ n }_{\mathbb {E}}\left( {\lambda } \right) = \min \{\mu \in \mathbb {E} \cup \{\infty \}\mid \mu > \lambda \}\) and \(\hat{\text{ n }}_{\mathbb {E}}\left( {\lambda } \right) = \min \{\mu \in \mathbb {E} \cup \{\infty \}\mid \mu \ge \lambda \}\).

Let (Gw) be an edge-weighted graph and let X be a subgraph of G. The sequence of all \(\lambda \)-level partitions of X for w, ordered by increasing value of \(\lambda \), is a hierarchy, defined by \(\mathcal {QFZ}(X,w) = (\mathbf C (w^{V}_{\lambda }(X)) \mid \lambda \in \mathbb {E} \cup \{\infty \})\), and called the quasi-flat zone hierarchy of X for w. Let \(\mathcal {H}\) be the quasi-flat zone hierarchy of G for w. Given a vertex x of G and a value \(\lambda \) in \(\mathbb {E}\), the region that contains x in the \(\lambda \)-level partition of the graph G is denoted by \(\mathcal {H}^{\lambda }_x\).

Let us consider a minimum spanning tree T of (Gw). It has been shown in [4] that \(\mathcal {QFZ}(T,w)\) of T for w is the same as \(\mathcal {QFZ}(G,w)\) of G for w. This indicates that the quasi-flat zone hierarchy for G can be handled by its minimum spanning tree.

2.2 Hierarchical Graph-Based Segmentation Method

In this article, we consider that the input is the edge-weighted graph (Gw) representing an image, where the pixels correspond to the vertices of G and the edges link adjacent pixels. The weight of each edge is given by a dissimilarity measure between the linked pixels such as the absolute difference of intensity between them.

Before explaining the HGB method, we first describe the following observation scale dissimilarity [8], which is required by the method and whose idea originates from the region merging criterion proposed in [6].

Observation Scale Dissimilarity. Let \(R_1\) and \(R_2\) be two adjacent regions, the dissimilarity measure compares the so-called inter-component and within-component differences [6]. The inter-component difference between \(R_1\) and \(R_2\) is defined by \(\varDelta _{inter}(R_1, R_2) = \min \{ w\left( \{x,y\} \right) | x \in R_1, y \in R_2, \{ x,y\}\in E(T) \}\), while the within-component difference of a region R is defined by \(\varDelta _{intra}(R) = \max \{ w\left( \{x,y\} \right) | x,y \in R, \{ x,y\}\in E(T) \}\). It leads to the observation scale of \(R_1\) relative to \(R_2\), defined by \(S_{R_2}(R_1) = \left( \varDelta _{inter}(R_1, R_2) - \varDelta _{intra}(R_1) \right) |R_1|\), where \(|R_1|\) is the cardinality of \(R_1\). Then, a symmetric metric between \(R_1\) and \(R_2\), called the observation scale dissimilarity between \(R_1\) and \(R_2\), is defined by

$$\begin{aligned} D(R_1, R_2) = \max \{ S_{R_2}(R_1), S_{R_1}(R_2) \}. \end{aligned}$$
(1)

This dissimilarity is used to determine if two regions should be merged or not at a certain observation scale in the following.

HGB Method. The HGB method [8] is presented in Method 1. The input is a graph G representing an image with its associated weight function w, where the minimum spanning tree T of G is taken. Given (Tw), the HGB method computes a new weight function f which leads to a new hierarchy \(\mathcal {H}=\mathcal {QFZ}(T, f)\). The resulting hierarchy \(\mathcal {H}\) is considered as the hierarchical image segmentations of the initial image. Thus, the core of the method is the generation of the weight function f for T.

figure a

After initializing all values of f to infinity (see Line 1), we compute an observation scale value f(u) for each edge \(u \in E(T)\) in non-decreasing order with respect to the original weight w (see Line 2). Note that each iteration in the loop requires updating the hierarchy \(\mathcal {H}=\mathcal {QFZ}(T, f)\) (see Line 3). An efficient algorithm for the hierarchy update can be found in [2]. Once \(\mathcal {H}\) is updated, the value \(\lambda _\mathcal {H}^\star (u)\) of a finite subset \(\mathbb {E}\) of \(\mathbb {R}\) is obtained by

$$\begin{aligned} \lambda _\mathcal {H}^\star (\{x,y\}) = \min \left\{ \lambda \in \mathbb {E} \mid D\left( \mathcal {H}^{\lambda }_{x}, \mathcal {H}^{\lambda }_{y} \right) \le \lambda \right\} . \end{aligned}$$
(2)

We first consider the regions \(\mathcal {H}^{\lambda }_x\) and \(\mathcal {H}^{\lambda }_y\) at a level \(\lambda \). Using the dissimilarity measure D, we check if \(D \left( \mathcal {H}^{\lambda }_x, \mathcal {H}^{\lambda }_y \right) \le \lambda \). Equation (2) states that the observation scale \(\lambda _\mathcal {H}^\star (\{x,y\})\) is the minimum value \(\lambda \) for which this assertion holds.

3 Observation Scale Intervals

3.1 Non-increasing Observation Criterion

In this section, we provide a formal definition of the observation criterion which is involved in Eq. (2). Then, we discuss its non-increasing behaviour opening the doors towards new strategies to select interesting observation scale values based on Felzenszwalb-Huttenlocher dissimilarity measure as used in Method 1.

In the remaining part of this section, we consider that \(\mathcal {H}\) is any hierarchy and that \(u = \{x,y\}\) is any edge of T.

Let \(\lambda \) be any element in \(\mathbb {E}\). We say that \(\lambda \) is a positive observation scale (for \((\mathcal {H},u)\)) whenever \(D(\mathcal {H}_x^\lambda , \mathcal {H}_y^\lambda ) \le \lambda \). We denote by \(\mathcal {T}\) the Boolean criterion such that \(\mathcal {T}(\lambda )\) is true if and only if \(\lambda \) is a positive observation scale. The criterion \(\mathcal {T}\) is called the observation criterion. Dually, if \(\lambda \) is not a positive observation scale, then we say that \(\lambda \) is a negative observation scale (for \((\mathcal {H},u)\)). If \(\lambda \) is a negative observation scale, then we have \(D(\mathcal {H}_x^\lambda , \mathcal {H}_y^\lambda ) > \lambda \).

Observe that the value \(\lambda ^{\star }_{\mathcal {H}}(x,y)\) defined in Eq. (2) is simply the lowest element of \(\mathbb {E}\) such that \(\mathcal {T}(\lambda )\) is true. Dually, we denote by \(\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)\), the largest negative observation scale.

Intuitively, a positive observation scale corresponds to a level of the hierarchy \(\mathcal {H}\) for which the two regions linked by u should be merged according to the observation criterion \(\mathcal {T}\) which is based on the dissimilarity measure D. On the other hand, a negative observation scale corresponds to a level of the hierarchy for which the two associated regions should remain disjoint. A desirable property would be that the observation criterion \(\mathcal {T}\) be increasing with respect to scales, a Boolean criterion \(\mathcal {T}\) being increasing whenever, for any scale value \(\lambda \in \mathbb {E}\), \(\mathcal {T}(\lambda )\) holds true implies that \(\mathcal {T}(\lambda ^\prime )\) holds true for any scale \(\lambda ^\prime \) greater than \(\lambda \). Indeed, in such desirable case, any level in \({\mathbb {E}}\) greater than \(\lambda ^\star _{\mathcal {H}}(x,y)\) would be a positive observation scale, whereas any level not greater than \(\lambda ^\star _{\mathcal {H}}(x,y)\) would be a negative scale. In other words, we would have \(\lambda ^\star _{\mathcal {H}}(x,y) = \text{ n }_{\mathbb {E}}\left( {\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)} \right) \). Hence, it would be easily argued that the observation scale of the edge u must be set to \(\lambda ^\star _{\mathcal {H}}(x,y)\). However, in general, the criterion \(\mathcal {T}\) is not increasing (see a counterexample in Fig. 2) and we have \(\lambda ^\star _{\mathcal {H}}(x,y) < \text{ n }_{\mathbb {E}}\left( {\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)} \right) \). Therefore, it is interesting to investigate strategies which can be adopted to select a significant observation scale between \(\lambda ^\star _{\mathcal {H}}(x,y)\) and \(\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)\) (see in Fig. 3 a graphical illustration of different situations which may occur). In other words, the criterion \(\mathcal {T}\) is transformed into an increasing criterion \(\mathcal {T}'\).

Fig. 2.
figure 2

Counterexample for the increasing property of the observation criterion \(\mathcal {T}\): for the edge \(u =\{x,y\}\) with \(x= f\)\(y =g\), we have \(D(\mathcal {H}^{1}_{x}, \mathcal {H}^{1}_{y}) = 17\), \(D(\mathcal {H}^{23}_{x}, \mathcal {H}^{23}_{y}) = 17\), and \(D(\mathcal {H}^{25}_{x}, \mathcal {H}^{25}_{y}) = 28\). Hence, we have \(\mathcal {T}(1) = false\)\(\mathcal {T}(23) = true\), and \(\mathcal {T}(25) = false\), which proves that \(\mathcal {T}\) is not increasing.

In the framework of mathematical morphology, non-increasing regional attributes/criteria are known to be useful but difficult to handle. Several rules or strategies to handle non-increasing criteria have been considered in the context of connected filters. Among them, one may cite the min- and max-rules [15] or the Viterbi [15] and the shape-space filtering [17] strategies. Note that the strategy adopted in Eq. (2) corresponds to the min-rule and that the strategy consisting of selecting \(\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)\) corresponds to the max-rule. Our main goal in this article is to investigate other strategies to efficiently handle the non-increasing observation criterion \(\mathcal {T}\) in the context of hierarchical segmentation and edge-observation scale selection based on the Felzenszwalb-Huttenlocher region dissimilarity measure. Before presenting our proposed selection strategies, we first define positive and negative observation intervals together with an algorithm to compute them.

Fig. 3.
figure 3

Illustration of possible observation scale selection strategies. The positive observation intervals are represented in gray. On the left, the min-, the lower \(\alpha \)-length and the lower p-rank selection strategies select the scales \(\lambda _1\)\(\lambda _2\) and \(\lambda _3\), respectively (for a value of \(\alpha \) which is a little larger than the leftmost gray interval and for \(p =0.3\)), whereas, on the right, the max-, the upper \(\alpha \)-length and the upper p-rank selection strategies select the scales \(\lambda _4\)\(\lambda _5\) and \(\lambda _6\), respectively.

3.2 Algorithm for Computing Observation Intervals

Let \(\lambda _{1}\) and \(\lambda _{2}\) be any two real numbers in \(\mathbb {E} \cup \{-\infty \}\) such that \(\lambda _1 < \lambda _2\). We denote by \(\rrbracket \lambda _1, \lambda _2 \rrbracket _{\mathbb {E}}\) the subset of \(\mathbb {E}\) that contains every element of \(\mathbb {E}\) that is both greater than \(\lambda _{1}\) and not greater than \(\lambda _{2}\): \(\rrbracket \lambda _1, \lambda _2 \rrbracket _{\mathbb {E}} = \{\lambda \in \mathbb {E} \; | \;\lambda _1 < \lambda \le \lambda _2\}\). We say that a subset I of \(\mathbb {E}\) is an open-closed interval of \(\mathbb {E}\), or simply an interval, if there exist two real values \(\lambda _1\) and \(\lambda _2\) in \(\mathbb {E}\) such that I is equal to \(\rrbracket \lambda _1, \lambda _2 \rrbracket _{\mathbb {E}}\).

Definition 1

(observation interval). Let \(\mathcal {H}\) be any hierarchy, let u be any edge in E(T), and let I be an interval. We say that I is a positive observation interval (resp. a negative observation interval) for \((\mathcal {H},u)\) if the two following statements hold true:

  1. 1.

    any element in I is a positive (resp. negative) observation scale for \((\mathcal {H},u)\); and

  2. 2.

    I is maximal among all intervals for which statement (1) holds true, i.e., any interval which is a proper superset of I contains a negative (resp. positive) observation scale for \((\mathcal {H},u)\).

The set of all positive (resp. negative) observation intervals is denoted by \(\varLambda _{\mathcal {H}}( u)\) (resp. by \(\overline{\varLambda }_{\mathcal {H}}(u)\)).

In order to compute \(\varLambda _{\mathcal {H}}( \{x,y\})\), we follow the strategy presented in [2], which relies on the component tree of the hierarchy \(\mathcal {H}\). The component tree of \(\mathcal {H}\) is the pair \(\mathcal {T}_{\mathcal {H}} = (\mathcal {N}, parent)\) such that  \(\mathcal {N}\) is the set of all regions of \(\mathcal {H}\) and such that a region \(R_1\) in \(\mathcal {N}\) is a parent of a region \(R_2\) in \(\mathcal {N}\) whenever \(R_1\) is a minimal (for inclusion relation) proper superset of \(R_2\). Note that every region in \(\mathcal {N}\) has exactly one parent, except the region V which has no parent and is called the root of the component tree of \(\mathcal {H}\). Any region which is not the parent of another one is called a leaf of the tree. It can be observed that any singleton of V is a leaf of \(\mathcal {T}_{\mathcal {H}}\) and that conversely any leaf of \(\mathcal {T}_{\mathcal {H}}\) is a singleton of V. The level of a region R in \(\mathcal {H}\) is the highest index of a partition that contains R in \(\mathcal {H}\). Then, the proposed algorithm, whose precise description is given in Algorithm 1, browses in increasing order the levels of the regions containing x and y until finding a value \(\lambda \) such that \(D(\mathcal {H}^{\lambda }_x,\mathcal {H}^{\lambda }_y ) \le \lambda \). This value is then \(\lambda ^\star _{\mathcal {H}}(x,y)\) defined by Eq. (2). This value is also the lower bound of the first positive observation interval. If we keep browsing the levels of the regions containing x and y in this tree, as long as \(D(\mathcal {H}^{\lambda }_x,\mathcal {H}^{\lambda }_y ) \le \lambda \), we can identify the upper bound of this first positive observation interval. We can further continue to browse the levels of the regions containing x and y in the tree in order to identify all positive observation intervals. Therefore, at the end of the execution, we can return the set \(\varLambda _{\mathcal {H}}( \{x,y\})\) of all positive observation intervals. From the set \(\varLambda _{\mathcal {H}}( \{x,y\})\), we can obtain by duality the set \(\overline{\varLambda }_{\mathcal {H}}( \{x,y\})\) of all negative observation intervals.

The time complexity of Algorithm 1 depends linearly on the number of regions in the branches of the component tree of \(\mathcal {H}\) containing x and y since it consists of browsing all these regions from the leaves to the root. In the worst case, at every level of the hierarchy the region containing x is merged with a singleton region. Hence, as there are |V| vertices in G, in this case, the branch of x contains |V| regions. Thus, the worst-case time complexity of Algorithm 1 is O(|V|). However, in many practical cases, the component tree of \(\mathcal {H}\) is well balanced and each region of \(\mathcal {H}\) results from the merging of two regions of (approximately) the same size. Then, if the tree is balanced, the branch of x contains \(O(\text{ log }_2(|V|))\) nodes and the time-complexity of Algorithm 1 reduces to \(O(\text{ log }_2 (|V|))\).

4 Selecting Observation Scales

Let K be any subset of \(\mathbb {E}\). We consider the two following selection rules from K to set the value of f(u) in Method 1:

$$\begin{aligned} \text {min-rule: } f(u) :=&\min \{k \in K\}\text {; and}\\ \text {max-rule: } f(u) :=&\max \{k \in K\}. \end{aligned}$$

When K is the set of the positive observation scales, the result obtained with the min-rule is called the min-selection strategy. Note that the results obtained with the min-selection strategy correspond exactly to the results obtained with the method presented in [2, 8], as described by Eq. (2). In this article, we also consider the max-selection strategy, that is the result obtained with the max-rule when K is the set of the negative observation scales. Furthermore, we also apply these rules to filtered sets of positive and of negative observation scales. The motivation for introducing these strategies is to regularise the observation criterion with respect to scales in order to cope with situations such as the ones shown in Fig. 3. As filterings, we investigate the well-known rank and area filters.

figure b

Let us first provide a precise definition of the rank filters that we apply to the positive and to the negative observation scales. The intuitive idea of the selection strategies based on these filters is to remove a lower percentile of the positive observation scales, which is then considered as non significant, before applying the min-rule and to remove an upper percentile of the negative observation scales before applying the max-rule.

Let K be any subset of \(\mathbb {E}\) with n elements. Let k be any positive integer less than n. We denote by \(\text{ rank }_{k/n}(K)\) the element e of K such that there are exactly k distinct elements in K which are less than e. Let p be any real value between 0 and 1, we set \(\text{ rank }_{p}(K) = \text{ rank }_{\lfloor p.n \rfloor / n}(K)\), where \(\lfloor p.n \rfloor \) is the largest integer which is not greater than the real value p.n.

Let p be any real value between 0 and 1. A p-positive observation scale for \((\mathcal {H},u)\) is any positive observation scale for \((\mathcal {H},u)\) that is greater than \(\text{ rank }_p(K)\) where K is the set of all positive observation scales not greater than \(\overline{\lambda }^{\star }_{\mathcal {H}}(x,y)\). A p-negative observation scale for \((\mathcal {H},u)\) is any negative observation scale for \((\mathcal {H},u)\) that is less than \(\text{ rank }_{1-p}(K)\) where K is the set of all negative observation scales not greater than \(\lambda ^{\star }_{\mathcal {H}}(x,y)\). The min-rule from the set of all p-positive observation scales is called the lower p-rank selection strategy while the max-rule from the set of all p-negative observation scales is called the upper p-rank selection strategy.

Let us now describe the selection strategies obtained by applying an area filter on the positive and on the negative observations scales before applying the min- and max-rules. Let \(\mathcal {H}\) be any hierarchy, let \(\{x,y\}\) be any edge of G and let \(\alpha \) be any positive integer. We set \(\varLambda ^\alpha _{\mathcal {H}}( \{x,y\}) = \{ \rrbracket \lambda _1,\lambda _2 \rrbracket _\mathbb {E} \in \varLambda _{\mathcal {H}}( \{x,y\}) \; | \;\lambda _2 - \lambda _1 \ge \alpha \}\) and \(\overline{\varLambda }^\alpha _{\mathcal {H}}( \{x,y\}) = \{\rrbracket \lambda _1,\lambda _2 \rrbracket _\mathbb {E} \in \overline{\varLambda }_{\mathcal {H}}( \{x,y\}) \; | \;\lambda _2 - \lambda _1 \ge \alpha \}\). The min-rule from the set \(\cup \varLambda ^\alpha _{\mathcal {H}}( \{x,y\})\) and the max-rule from the set \(\cup \overline{\varLambda }^\alpha _{\mathcal {H}}( \{x,y\})\) are called the lower \(\alpha \)-length selection strategy and the upper \(\alpha \)-length selection strategy, respectively.

The six selection strategies introduced in this section are illustrated in Fig. 3.

5 Experiments

In this section we aim to compare the segmentation results obtained from the original HGB method against the segmentations obtained by our strategies. To this end, we use the Berkeley Segmentation Dataset (BSDS) and associated evaluation framework [1] for our experiments. This dataset consists of 500 natural images of size \(321 \times 481\) pixels. In order to perform a quantitative analysis, we use the F-measures defined from the precision-recall for regions \(F_r\). The segmentation is perfect when \(F_r = 1\) and totally different from the ground-truth when \(F_r = 0\). Each image is represented by a graph where the vertices are the pixels, where the edges are given by the 4-adjacency relation, and where the weight of any edge is given by the Euclidean distance between the colors of the pixels linked by this edge. We compute for each image a set of segmentations at different scales. From each pair made of an image segmentation and the associated ground truth, we obtain one F-measure value. Then, we keep the best \(F_r\)-measure obtained for each image of the database. Alternatively, we can also keep the \(F_r\)-measure for a constant scale over the database, such that the constant scale is chosen to maximize the average \(F_r\)-measure of the overall database. They are called optimal image scale (OIS) and optimal database scale (ODS) respectively.

In Table 1, we see the average \(F_r\) scores for ODS and OIS on the BSDS dataset. As we can observe, we obtain much better segmentation results from the selection strategies that use max-rule over the selection strategies using min-rule. Furthermore, among the selection strategies that use max-rule, the upper p-rank selection shows a slight improvement over the max selection. We also show in Fig. 4 the distribution of the best \(F_r\) scores for our strategies. In Fig. 1, we can see a qualitative comparison between the saliency maps resulting from the HGB method using the min selection strategy over our upper p-rank strategy which shows a significant improvement.

Table 1. Average \(F_r\) scores for BSDS dataset. In the table, avg., param. and med. stands for average, parameter, and median, respectively.
Fig. 4.
figure 4

Distribution \(F_r\) scores for ODS and OIS on the BSDS dataset, respectively.

6 Conclusions

In this article, we study the HGB method with the aim of proposing new strategies for selecting an observation scale that can lead to better segmentation results. To this end, we propose an algorithm that computes all the scales for which the Felzenswalb-Huttenlocher dissimilarity measure indicates that the regions should merge. Dually, we are able to obtain based on the min- and max-rule selection with filtering techniques the negative intervals. Then, we propose several strategies to select scales at both positive and negative intervals. We validate the performance of our strategies on the BSDS dataset. The best performance was achieved by our upper p-rank strategy (see Table 1).

As future work, we plan to use other gradients to weight the edges of the graph. This along with our proposed strategies can lead to better segmentation results, as we can observe in Fig. 5.

Fig. 5.
figure 5

Saliency maps resulting from the HGB method with upper p-rank selection strategy using as edge weights Euclidean distance (middle) and the structured edge detector from [5] (right).