Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry

Hou, Benjamin; Miolane, Nina; Khanal, Bishesh; Lee, Matthew C. H.; Alansary, Amir; McDonagh, Steven; Hajnal, Jo V.; Rueckert, Daniel; Glocker, Ben; Kainz, Bernhard

doi:10.1007/978-3-030-00928-1_85

Benjamin Hou²⁵,
Nina Miolane²⁶,
Bishesh Khanal^25,27,
Matthew C. H. Lee^25,28,
Amir Alansary²⁵,
Steven McDonagh²⁵,
Jo V. Hajnal²⁷,
Daniel Rueckert²⁵,
Ben Glocker²⁵ &
…
Bernhard Kainz²⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11070))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

14k Accesses
11 Citations

Abstract

Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis. Deep learning methods often parameterise poses with a representation that separates rotation and translation. As commonly available frameworks do not provide means to calculate loss on a manifold, regression is usually performed using the L2-norm independently on the rotation’s and the translation’s parameterisations. This is a metric for linear spaces that does not take into account the Lie group structure of SE(3).

In this paper, we propose a general Riemannian formulation of the pose estimation problem, and train CNNs directly on SE(3) equipped with a left-invariant Riemannian metric. The loss between the ground truth and predicted pose (elements of the manifold) is calculated as the Riemannian geodesic distance, which couples together the translation and rotation components. Network weights are updated by back-propagating the gradient with respect to the predicted pose on the tangent space of the manifold SE(3). We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, on tasks such as image-based localisation and intensity-based 2D/3D registration. We also show that hyper-parameters, used in our loss function to weight the contribution between rotations and translations, can be intrinsically calculated from the dataset to achieve greater performance margins.

You have full access to this open access chapter, Download conference paper PDF

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Towards improvement of baseline performance for regression based human pose estimation

Article 26 May 2023

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

1 Introduction

Intensity-based registration and landmark matching are the de-facto standards to align data from multiple image sources into a common co-ordinate system. Applications that require intensity-based registration include e.g., atlas-based segmentation [1], motion-compensation [15], tracking [10], or clinical analysis of the data visualised in a standard co-ordinate system. These often require manual initialisation of the alignment since general optimisation methods often cannot find a global minimum from any given starting position on the cost function. An initial rigid registration can be achieved by selecting common landmarks [2] through an iterative agent, which impedes hard real-time constraints or less robustly through local image descriptors [19].

Convolutional Neural Networks (CNNs) have shown promising results for intra and inter modal alignment [4, 10]. These approaches show that information about a learn-able canonical co-ordinate system is encoded directly in the features of an image. Early work in this domain showed that image’s pose (i.e. position and orientation) can be regressed relatively to a canonical alignment from a large set of training images sampling the canonical space [7]. Follow-up formulations for medical applications showed similar success for motion compensation and device localisation [4, 10]. However, these approaches rely on heuristic approximations and manual fine-tuning of the CNN loss used to characterise the poses’ prediction error. This fosters domain shift problems and limits options for interchangeable application of various deep learning pose estimation models.

Contribution: We introduce a new loss function that calculates the geodesic distance of two poses on the SE(3) manifold, from a data-adaptive Riemannian metric. We derive appropriate gradients that are required for CNN back propagation. Our method couples the translation and rotation parameters, and regresses them simultaneously as one parameter on the Lie algebra $\mathfrak {se}(3)$. We show that our loss function is agnostic to the architecture by training different CNNs and can effectively predict poses that are comparable to state-of-the-art methods. In addition, we demonstrate that hyper-parameters tuning for our loss function can be directly calculated from the dataset, thus avoiding long and expensive optimisation searches to boost performance. Finally, we validate quantitatively by benchmarking the performance of our loss function with current state-of-the-art methods, and validate their statistical significance with Student’s t-test.

Background: A pose, i.e. a rigid transformation in 3D, is an element of the Lie group SE(3), the Special Euclidean group in 3D. A pose has two components; a rotation component of group SO(3) and a translation component of $\mathbb {R}^3$. SE(3) has the following matrix representation (homogeneous representation):

$$\begin{aligned} SE(3) = \left\{ X \quad | \quad X = \left[ \begin{array}{c|c} R &{} t \\ \hline 0 &{} 1 \end{array} \right] , t \in \mathbb {R}^3, R \in SO(3) \right\} \end{aligned}$$

(1)

In usual implementations of SE(3), the rotation (SO(3)) can be parameterised in any form as long as the group structure is implicitly imposed. R can be stored as Euler angles, quaternions, axis-angle or SO(3) rotation matrix. The numerical properties of each parameterisation need to be considered carefully, especially when designing deep learning applications, as it can impact efficacy.

Hyunh et al. [5] have shown that Euler parameterisation is not unique, this is undesirable as two different mappings can represent the same rotation. A rotation matrix, carrying 9 parameters, is over parameterised and has a strict ordinance on orthonormality. Non-orthogonal rotation matrix can result in skewed or sheared transformations, making it undesirable also. Quaternion parameterisation are often favoured as it can be mapped to valid transformations after normalisation. However, the parameterisation chosen for the rotation is almost never coupled with the parameterisation of the translation, thus denying the intrinsic structure of $SE(3)=SO(3) \ltimes \mathbb {R}^3$. Here, we choose to represent the rotation and the translation together as an element of the Lie algebra of SE(3), i.e. its tangent space at the identity element of the group, denoted $\mathfrak {se}(3)$. It represents the best linear approximation of SE(3) around its identity element. Since the Lie group SE(3) is 6-dimensional, an element of $\mathfrak {se}(3)$ is a 6D vector.

One can define a collection of distances on SE(3), which can be used as loss functions in deep learning applications. A popular choice for the loss is the Euclidean distance associated to the L2-norm. However, the L2-norm is not desirable on SE(3) since it does not respect the manifold’s non-linearity and can lead to unpredictable behaviours. It is also undesirable to use two separates L2-norms on SO(3) and $\mathbb {R}^3$ since SE(3) is not a direct product, and SO(3) itself is non-linear: this can be observed visually with quaternions, e.g., the Euclidean distance of two quaternions can be small, despite the rotation being large. This disparity causes network weight updates to be sub optimal. Hence it is desirable to have a loss function that respects the structure of SE(3).

Related Work: Popular deep learning frameworks, such as Caffe, TensorFlow, Theano, PyTorch, do not provide the means to regress on SE(3), as the common losses used are cross-entropy for probabilities or a p-norms for distances.

Kendall et al. [7] uses the L2-norm to regress parameters on the Lie algebra $\mathfrak {se}(3)$ directly, with a $\beta $ parameter to weight the contribution between rotation and translation. This was similarly performed by authors in [9, 10, 13, 17], who use the predicted parameters for registration tasks. Alternatively, [8, 16] re-parameterised the pose parameters as projected co-ordinates on a 2D view plane. This was similarly performed by [3, 4] with Anchor Points, where three arbitrary selected reference points on a 2D plane define the plane’s location in 3D space. Using the L2-norm to calculate the Euclidean distance between a predicted projection co-ordinate and the ground truth projection co-ordinate is appropriate, as the L2-norm is the appropriate metric. To the best of our knowledge, there is currently no loss function that respects the full Lie group structure of SE(3), for example invariant Riemannian metrics on SE(3) have not been used.

2 Method

The core of our method is to implement a new loss layer that is agnostic to the network architecture used: we define the loss as the geodesic distance on SE(3) equipped with a left-invariant Riemannian metric, shown in Fig. 1.

Left-invariant Riemannian Metric On SE(3): A Riemannian metric on SE(3) is a smooth collection of positive definite inner products on each tangent space of SE(3). Then, SE(3) becomes a Riemannian manifold. With a left-invariant metric, it is enough to define an inner product on the tangent space at the identity of SE(3), and then “propagate” it: the metric is s.t. $\forall u, v \in T_{p_1}SE(3)$ and $\forall p_1,p_2 \in SE(3)$: $<DL_{p_1}(p_2) u, DL_{p_1}(p_2) v>|_{L_{p_1} p_2} = <u, v>|_{p_2} $ where $L_{p_1}$ is the left translation by $p_1$: $L_{p_1}(p_2) = p_1 \circ p_2$, and $DL_{p_1}(p_2)$ its differential at $p_2$. We define an inner product Z at $p_2 = \text {identity}$ so that we get a metric $Z_{p_1}$ at the tangent space of any pose $p_1$ of SE(3) [12], and thus to compute inner products and norms of tangent vectors at $p_1$.

Loss and Gradient: We use the loss function: $ \text {loss}(\mathbf {p},\hat{\mathbf {p}}) = \text {dist}_{SE(3)}^Z(\mathbf {p},\hat{\mathbf {p}})^2 = {\left\| \text {Log}_{\hat{\mathbf {p}}}^{Z}(\mathbf {p}) \right\| }_{Z_{\hat{\mathbf {p}}}}^2 $ where $\text {dist}_{SE(3)}^Z$ is the geodesic distance and Log is the Riemannian logarithm at $\hat{\mathbf {p}}$ i.e. a tangent vector at $\hat{\mathbf {p}}$. We use a left-invariant Riemannian metric, thus: $\text {loss}(\mathbf {p},\hat{\mathbf {p}}) = ||DL_{\hat{\mathbf {p}}^{-1}}.\text {Log}_{\hat{\mathbf {p}}}^{Z}(\mathbf {p})||_{Z}^2$, where we now have a tangent vector at the identity and we can use the inner product Z. If we take Z being the canonical inner product at identity, this is the L2-norm but on the tangent vector transported from $\hat{\mathbf {p}}$ to identity using the differential $DL_{\hat{\mathbf {p}}^{-1}}$. The backward gradient corresponding to the loss seen as a function of $\hat{\mathbf {p}}$ is $ \nabla _{\hat{\mathbf {p}}} \text {loss}(\mathbf {p},\hat{\mathbf {p}}) = -2 \cdot \text {Log}_{\hat{\mathbf {p}}}^{Z}({\mathbf {p}}) $ [14] which is a tangent vector at $\hat{\mathbf {p}}$.

Implementation: The inputs to the loss layer are the poses $\mathbf {p}$ and $\hat{\mathbf {p}}$ for ground truth and prediction respectively. We represent a pose with geomstats implementation [11] i.e. as the Riemannian Logarithm of canonical left-invariant metric on SE(3) s.t. $ p = \{ r, t \} = \{ r_x , r_y , r_z , t_x , t_y , t_z \} \in \mathbb {R}^6$. With this parameterisation, the rotation r is in axis-angle parameterisation, the inner product Z is a 6$\,\times \,$6 positive definite matrix and the differential $DL_{\hat{p}}$ of the left translation is the 6$\,\times \,$6 jacobian matrix: $ J_{\hat{p}} = \begin{pmatrix} \frac{\partial {L_{\hat{p}}}^{r}}{\partial r} &{} \frac{\partial {L_{\hat{p}}}^{r}}{\partial t} \\ \frac{\partial {L_{\hat{p}}}^{t}}{\partial r} &{} \frac{\partial {L_{\hat{p}}}^{t}}{\partial t} \end{pmatrix} $. We denote $v_t = \text {Log}_{\hat{\mathbf {p}}}^{Z}(\mathbf {p})$ which is a tangent vector at $\hat{\mathbf {p}}$ in this parameterisation. The loss is calculated by $\text {loss}(\mathbf {p},\hat{\mathbf {p}}) = v_t^T *J_{\hat{p}^{-1}}^T *Z *J_{\hat{p}^{-1}} *v_t$ where $*$ is the matrix multiplication and the Riemannian logarithm $v_t$ is given by geomstats. The gradient is calculated by: $\nabla _{\hat{\mathbf {p}}} \text {loss}(\mathbf {p},\hat{\mathbf {p}})= -2 * J_{{\hat{p}}^{-1}}^T *Z *J_{{\hat{p}}^{-1}} * v_t$.

3 Experiments and Results

We evaluate our novel loss function on three datasets: (Exp1) the publicly available PoseNet dataset [7], which allows a direct comparison to state-of-the-art in Computer Vision and further evaluates optimisation strategies for these experiments. (Exp2), C-Arm X-Ray to Computed Tomography (CT) alignment problem with data from [4]. (Exp3), the pose estimation dataset for motion compensation in fetal Magnetic Resonance Imaging (MRI) from [3].

In each experiment, we benchmark existing SE(3) parameterisation strategies with the respective loss function used. PoseNet: direct regression of parameters on the Lie algebra $\mathfrak {se}(3)$, where a combination of quaternion and translation parameters are regressed using L2-norms with a static $\beta $ parameter to weight the respective contribution. Anchor Points formulation: a re-parameterisation of SE(3) in Euclidean space, where three statically defined points in 3D space defines a plane. Each Anchor Point is regressed independently using the L2-norm. Finally, our SE(3) loss, i.e., the geodesic distance on the Riemannian manifold.

All experiments are conducted using the Caffe framework, on a computer equipped with an Intel i7 6700K CPU and Nvidia Titan X Pascal GPU.

Exp.1: Metric Localisation on Natural Images: In this experiment, we replicated PoseNet’s original experiment [7] on the King’s College dataset as a baseline benchmark. [7] extracted images from a series of videos, and fed them into a structure from motion pipeline to create a 3D model in order to extract plane locations with respect to a world co-ordinate reference frame. The parameterisations of this dataset are quaternions with translation offsets. We mirrored the dataset using axis-angle representation instead of quaternions, and used our SE(3) loss function as regressor. Both networks were trained with a GoogLeNet [18] base architecture with no parameter weighting.

Table 1. Mean error of loss functions on natural images

Full size table

We convert the predicted and ground truth poses to Euler angles in degrees and translation in meters, along with the geodesic distance (G.D.) on the manifold. Table 1-A shows the average errors of each parameter. It can be seen that the error is similar in each Euler and translation parameters, which is confirmed by Student’s t-test to be insignificant. However, the geodesic distance error of SE(3) is much lower compared to PoseNet. Despite this, Student’s t-test still shows no significant difference, which is caused by the large variance of PoseNet.

To tune the weight parameter $\beta $, Kendall et al. [7] performed grid search and found that $\beta = [120,750]$ works best for indoor scenes and $\beta = [250,2000]$ for outdoor scenes. Grid search is computationally very expensive, and it can be difficult to find an optimal value if the search interval is coarse. We show here that we can compute a data-adaptive Riemannian metric on SE(3) to weight the contribution of each parameter in the loss.

We first train the network with no weightings, followed by an inference pass through the entire validation dataset. We compute the prediction error as the rigid transformation: ${(y_{true})}^{-1} \circ y_{pred}$ and consider the dataset of their Riemannian logarithms at the identity $\{X_i\}_i$. The parameter weightings are then calculated by $\text {diag}({\text {cov}(X_i)}^{-1})$. The diagonal of the covariance matrix shows the variance of each parameter, whereas the inverse shows how tightly coupled it is to the mean. Thus, the higher the diagonal element, the tighter the variable is clustered. Elements, that are more sparsely coupled, are weighted less as they are likely to induce errors. The optimal weightings for the King’s College dataset from [7] are: $ \text {diag}({\text {cov}(X_i)}^{-1}) = \{ 0.147 , 0.954 , 0.261 , 0.001 , 0.003 , 0.002 \} $.

Table 1-B shows the performance of the networks retrained with suggested weightings. Student’s T-test still shows no significant difference between errors induced by PoseNet and SE(3). We note that having different weightings on the rotation part induces a distance that is not a Riemannian geodesic distance anymore. The properties of this distance will be investigated in future work.

Exp.2: Plane Detection on C-ARM Imaging: [4] demonstrated the versatility of CNNs for performing 2D/3D registration of C-Arm X-ray images to CT volumes using Anchor Points. In this experiment, we replicate the 2D/3D registration task using CaffeNet and evaluate the performance with the newly proposed SE(3) parameterisation and loss regressor. For comparison, a network was also trained with PoseNet’s parameterisation. All weight parameters are set to a default of 1. Table 2 shows the performance of each parameterisation.

Table 2. Mean error of loss functions on DRR (Digitally reconstructed radiographs)

Full size table

Here, we convert the predicted and ground truth poses to Euler angles in degrees, and translation to millimetres. Similar to Exp1, the average error for rotation and translation (for both healthy and pathological patients) are similar, and insignificant as confirmed by Student’s t-test. However, there is a noticeable trend in average geodesic distance errors. Student’s t-test showed significant difference (marked by *) between SE(3) loss compared to PoseNet and Anchor Points for both datasets. This shows that the geodesic metric is able to quantify properties that the metric expressed in Euler-translation parameters cannot.

Exp.3: 2D/3D Registration on Fetal Brain MRI: We replicate the results evaluation method from [3], and evaluate our loss regressor for 2D/3D registration used during motion compensation of fetal MRI data in canonical organ space. [3] uses aligned, reconstructed 3D brain volumes to learn a canonical orientation space and utilises an approach based on GoogLeNet to reorient unseen 2D brain slices into their correct anatomical location in this space. To sample the canonical training space we use the same Euler iteration method (i.e., $18^\circ $ steps in $R_x$ $R_y$ $R_z$ between $-90^{\circ }$ and $+90^{\circ }$, and 2 mm offsets in $T_z$ constrained between $-40$ and 40) to generate 1.12M images for the training set. The evaluation method is performed similarly as [3], with the performance summarised in Table 3. The validation dataset is composed of brain slices sampled with random Euler angles between $-90^\circ $ and $+90^\circ $, and random offsets between $-40$ and +40.

Table 3. Mean error of loss functions on fetal brain images

Full size table

Our SE(3) loss function shows drastic improvement in all image similarity metrics (Cross Correlation, Mean Squared Error, Peak Signal-to-Noise Ratio and Structural SIMilarity). This is confirmed by Student’s t-test which shows significant difference. This is crucial for Slice-to-Volume Registration (SVR) applications as the metric for slice alignment is derived from the metrics above [6].

Discussion: A pose is a combination of rotation and translation, therefore it seems reasonable that a CNN predicting a pose should use a metric that accounts for both simultaneously. Metrics are perceptually a method of measurement with its own set of rules, e.g., imperial vs. metric system for quantifying distances. Choosing a metric for a target application is not always straight forward and often a question of required precision, e.g., one would not measure the diameter of a pinhead with a meter rule, nor measure distance between cities with a calliper. We have shown that our loss function, using a Riemannian geodesic distance on SE(3) is better suited for medical registration tasks as shown in Exp2 and Exp3. Exp2 shows each test case yielding no significant difference on Euler and translation parameters, with significant difference on geodesic parameters. This suggests that Euler-translation parameters separately are not able to fully quantify the properties of SE(3). In Exp3, our loss function was able to significantly improve the image similarity metrics, as used by SVR algorithms.

4 Conclusion

In this work, we have presented a novel loss function to regress poses on the Lie group SE(3), and derived the necessary gradients required for CNN training. We showed that our method alleviates the need of re-parameterising regression parameters, which addresses the domain shift problem of deep learning applications. Our approach achieves similar results to manually fine-tuned approximations out-of-the-box, e.g., for data from a new scanner. This is demonstrated on the current state-of-the-art for pose estimation, PoseNet, where we show that our method achieves similar performance as the carefully tuned approximation used in [7]. We also show significant improvements for medical image pose estimation and outperform the state-of-the-art in this domain [3, 4].

References

Aljabar, P.: Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy. NeuroImage 46(3), 726–738 (2009)
Article Google Scholar
Ghesu, F.C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu, D.: An artificial agent for anatomical landmark detection in medical images. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9902, pp. 229–237. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46726-9_27
Chapter Google Scholar
Hou, B., et al.: 3D reconstruction in canonical co-ordinate space from arbitrarily oriented 2D images. IEEE Trans. Med. Imaging PP(99), 1 (2018)
Google Scholar
Hou, B., et al.: Predicting slice-to-volume transformation in presence of arbitrary subject motion. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 296–304. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_34
Chapter Google Scholar
Huynh, D.Q.: Metrics for 3D rotations: comparison and analysis. J. Math. Imaging Vis. 35(2), 155–164 (2009)
Article MathSciNet Google Scholar
Kainz, B.: Fast volume reconstruction from motion corrupted stacks of 2D slices. IEEE Trans. Med. Imaging 34(9), 1901–13 (2015)
Article Google Scholar
Kendall, A., et al.: Posenet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV, pp. 2938–2946 (2015)
Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE CVPR (2017)
Google Scholar
Liao, R., et al.: An artificial agent for robust image registration. In: AAAI, pp. 4168–4175 (2017)
Google Scholar
Miao, S.: A CNN regression approach for real-time 2D/3D registration. IEEE Trans. Med. Imaging 35(5), 1352–1363 (2016)
Article Google Scholar
Miolane, N., Mathe, J., Donnat, C., Jorda, M., Pennec, X.: Geomstats: a python package for riemannian geometry in machine learning (2018). https://github.com/ninamiolane/geomstats, https://arxiv.org/abs/1805.08308
Miolane, N., Pennec, X.: Computing bi-invariant pseudo-metrics on lie groups for consistent statistics. Entropy 17(4), 1850–1881 (2015)
Article MathSciNet Google Scholar
Pei, Y., et al.: Non-rigid craniofacial 2D-3D registration using CNN-based regression. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 117–125. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_14
Chapter Google Scholar
Pennec, X.: Probabilities and statistics on riemannian manifolds: basic tools for geometric measurements. In: NSIP, pp. 194–198. Citeseer (1999)
Google Scholar
Rousseau, F.: Registration-based approach for reconstruction of high-resolution in utero fetal MR brain images. Acad. Radiol. 13(9), 1072–1081 (2006)
Article Google Scholar
Sarkis, M., Diepold, K.: Camera-pose estimation via projective newton optimization on the manifold. IEEE Trans. Image Process. 21(4), 1729–1741 (2012)
Article MathSciNet Google Scholar
Sloan, J., et al.: Learning rigid image registration - utilizing cnns for medical image registration (2018). http://eprints.gla.ac.uk/156798/
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE CVPR 2015 (2015)
Google Scholar
Zitová, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, UK
Benjamin Hou, Bishesh Khanal, Matthew C. H. Lee, Amir Alansary, Steven McDonagh, Daniel Rueckert, Ben Glocker & Bernhard Kainz
INRIA and Stanford, Stanford, USA
Nina Miolane
King’s College London, London, UK
Bishesh Khanal & Jo V. Hajnal
HeartFlow, Redwood City, USA
Matthew C. H. Lee

Authors

Benjamin Hou
View author publications
You can also search for this author in PubMed Google Scholar
Nina Miolane
View author publications
You can also search for this author in PubMed Google Scholar
Bishesh Khanal
View author publications
You can also search for this author in PubMed Google Scholar
Matthew C. H. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Amir Alansary
View author publications
You can also search for this author in PubMed Google Scholar
Steven McDonagh
View author publications
You can also search for this author in PubMed Google Scholar
Jo V. Hajnal
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rueckert
View author publications
You can also search for this author in PubMed Google Scholar
Ben Glocker
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Kainz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Hou .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro F. Frangi
King’s College London, London, UK
Julia A. Schnabel
University of Pennsylvania, Philadelphia, PA, USA
Christos Davatzikos
Universidad de Valladolid, Valladolid, Spain
Carlos Alberola-López
Queen’s University, Kingston, ON, Canada
Gabor Fichtinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, B. et al. (2018). Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11070. Springer, Cham. https://doi.org/10.1007/978-3-030-00928-1_85

Download citation

DOI: https://doi.org/10.1007/978-3-030-00928-1_85
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00927-4
Online ISBN: 978-3-030-00928-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry

Abstract

Similar content being viewed by others

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Towards improvement of baseline performance for regression based human pose estimation

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

1 Introduction

2 Method

3 Experiments and Results

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry

Abstract

Similar content being viewed by others

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Towards improvement of baseline performance for regression based human pose estimation

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

1 Introduction

2 Method

3 Experiments and Results

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation