Keywords
bioimage informatics, quantitative geometry, computational geometry, workflow, python
This article is included in the NEUBIAS - the Bioimage Analysts Network gateway.
This article is included in the Bioinformatics gateway.
This article is included in the Python collection.
bioimage informatics, quantitative geometry, computational geometry, workflow, python
Small updates to answer reviewer comments include removal of reference to 'large scale', a clarification of reasoning behind licensing choices and removal of mention of alignment algorithms. We also updated the title, to remove mention of large scale, and updated affiliation of one of the authors.
See the authors' detailed response to the review by Virginie Uhlmann
Bioimage informatics aims at bringing microscopy into quantitative biology, associating higher level information to pixels to answer complex biological questions. In particular machine learning based techniques1 are easing the image analysis step, extracting geometrical objects from multidimensional images. But the next step, transforming that geometrical information into biological knowledge, involves a very diverse set of algorithmic tools in distinct communities, from spatial statistics2,3 to computational geometry4,5 or neuroinformatics6. Similarly, the software ecosystem around geometrical data analysis is very diverse and heterogeneous, with reference algorithm implementation spread across languages (Spatstat7 for spatial statistics in R, CGAL8 for computational geometry in C++) or across module in python (scipy9 for generic algorithms, anytree10 for trees, trimesh11 for meshes etc), a lack of generic geometric data exchange format and standard graphical tools like Fiji12 and Icy13 being limited in the flexibility of the analysis easily available. To address this problem, we propose GeNePy3D14,15, a python library meant as a ’middleware’ library to facilitate building data analysis workflows for geometrical objects by providing one convenient API for geometrical data I/O, conversion and interaction between geometrical objects and access to many common and less common algorithm. We will introduce below the architecture of the library and show one example workflow, re-analysing a published dataset of zebrafish brain neuronal traces by combining traces and brain region to extract quantitative metrics per region.
GeNePy3D14,15 was designed with any computational-minded life scientist as target user, to provide a simple and homogeneous API. GeNePy3D consists of four main objects (Figure 1) corresponding to four basic geometrical objects of interest: Points (cells or intracellular object positions...), Curve (particles tracks, neurite branches, microtubules...), Tree (neuronal traces, dividing cell tracks) and Surface (cell surface or other tissue level structure...). Each of them has its own attributes, functions and I/O. We provide ways to transform between them, (decomposing a Tree into sequences of Curve, or converting Points into the Surface that enclose them). Interaction between objects of the same/different classes are also available (optimal transport-based distance between two Points, intersection between Curve and Surface, etc.) Altogether, GeNePy3D offers a unified and seamless way to analyse complex geometrical biological data.
GeNePy3D is implemented in Python, taking advantage of a high-level programming language with simple syntax and many open-source packages. We reused algorithms and functions available from various recognised packages when possible, and developed our own implementation when needed, within a unique interface. Most of the packages we link to are available from the Python package Index (PyPi) and can be easily installed via Python package manager (pip). Figure 1 lists out some functions with colors denoting the package used. Beyond standard ones, more specific ones includes AnyTree for tree manipulation, TriMesh for surface manipulation or ScikitLearn for machine learning tasks. Other feature are listed as optional, as they come from harder to install or less recognized sources, including the C++ library CGAL, only partially available in Python, for generic object interaction in 3D, or the optimal transport method implemented in PyEMD. Some original development available in GeNePy3d include an algorithm to compute local 3D scale we recently published16. Many common input/output formats are supported including SWC for Tree, CSV, XYZ for Points/Curve and STL and OFF for Surface. We release the library in two packages for licensing issues (see licenses below).
GeNePy3D works with Python 3.6. Details of the specific software requirements, documentation including the installation instruction and Python notebooks examples can be accessed via https://genepy3d.gitlab.io. Example pipelines using GeNePy3D are run using Jupyter notebooks. To ease the use and deployment of GeNePy3D we provide ready to use docker containers at https://gitlab.com/genepy3d/genepy3d_dockers.
To exemplify the use of GeNePy3D14,15, we reanalyzed a recently published dataset containing up to 2000 traced neurons across the whole brain of larval zebrafish17. The authors annotated 36 symmetric regions and established a connectivity atlas for the neurons within these regions. Figure 2A illustrates a possible workflow using GeNePy3D for reanalyzing the dataset. The inputs consist of neuronal traces in SWC formats and a 3D volume in NRRD format containing different annotated labels for the 36 brain regions. The traces are imported into GeNePy3D under Tree objects, while the regions are reconstructed into Surface objects using marching cube algorithm. Figure 2B top illustrates the outline of the Tectum along with all neuronal traces arriving to this brain region. We then extracted branching point positions from the neuronal traces (Tree→Points), decomposed them into sections (Tree→Curves) and checked whether the branching points or curve sections lies within or outside each region (interaction with Surface). Examples of decomposing the traces, computing sections inside and outside the Tectum region are shown in Figure 2B bottom. Finally, we measured within the brain regions neuronal lengths, number of branching points, tortuosities (proportion of length over distance between two end points of the curve), and local 3D scales16 (scale at which the curve transforms to 3D).
Part of the resulting quantification obtained are shown Figure 2C. The top graph shows a longer neuronal length on averaged for groups of neurons arriving to and originating from the regions compared to ones passing through. Figure 2C bottom shows a map of the averaged neuronal length for each brain regions for arriving neurons showing that neurons coming from fore- and midbrain are much longer than those from hindbrain. Detail of all processing steps and additional quantified results can be found at https://gitlab.com/genepy3d/genepy3d_examples/-/tree/master/zebrafish_atlas.
The advent of machine learning and developments in biological imaging is leading to numerous geometrical datasets, and GeNePy3d14,15 aims at enabling complex analysis workflows based on those objects. But as in other aspects of bioimage informatics, the key will be for the community to work together and define common formats and structures for region of interests and geometric objects to ease the interactions between the various visualisation, data management or analysis tools, and convert raw images to biological knowledge. GeNePy3d is ready to become a component of that ecosystem.
The data used for Figure 2 has been published in https://fishatlas.neuro.mpg.de. To download the traces, we choose ’single axons’, ’connect without logging in’, chose ’Kunst et al. 2019’ in publications; once all neurons are loaded the download option appears.
GeNePy3D is hosted at: https://genepy3d.gitlab.io and easily installable through the PyPi tool.
Source code available at: https://gitlab.com/genepy3d.
Archived source code at time of publication:
GeNePy3D: https://doi.org/10.5281/zenodo.426946614.
GeNePy3D_GPL: https://doi.org/10.5281/zenodo.426948415.
License: The library is distributed as two packages. The main package GeNePy3D14 is under a BSD 3-Clause Licence, while features that necessitate linking to GPL-licensed code are distributed separately in GeNePy3D_GPL15, under the GNU General Public License v3.0.
We wanted to release GeNePy3D under a BSD license but could not avoid the use of some GPL license software, forcing us to such a solution. Practical consequences should be minimal in most circumstances thanks to modern python package management.
The source code for the analysis of Figure 2 is available at https://gitlab.com/genepy3d/genepy3d_examples/-/tree/master/zebrafish_atlas.
The authors wish to acknowledge Jean Livet, Emmanuel Beaurepaire and Katherine Matho for fruitful discussions and the collaboration that gave the incentive to develop this library, and Debora Keller-Olivier for reading the manuscript. This publication was supported by COST Action NEUBIAS (CA15124), funded by COST (European Cooperation in Science and Technology.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioimage analysis
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioimage informatics, machine learning
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioimage analysis
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 17 Jun 21 |
read | |
Version 1 26 Nov 20 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)