Elsevier

Computers & Graphics

Volume 71, April 2018, Pages 113-123
Computers & Graphics

Special Section on Cyberworlds 2017
Distributed monocular visual SLAM as a basis for a collaborative augmented reality framework

https://doi.org/10.1016/j.cag.2018.01.002Get rights and content

Highlights

  • Successfully used distributed visual SLAM for collaborative AR frame- work.

  • Used an appearance based method to find overlap of two maps.

  • Merged maps of nodes without the initial knowledge of their relative locations.

  • Proposed a quality measure to determine the best candidate from state- of-the-art features.

  • Developed distributed visual SLAM datasets and evaluated the proposed framework.

Abstract

Visual Simultaneous Localization and Mapping (SLAM) has been used for markerless tracking in augmented reality applications. Distributed SLAM helps multiple agents to collaboratively explore and build a global map of the environment while estimating their locations in it. One of the main challenges in distributed SLAM is to identify local map overlaps of these agents, especially when their initial relative positions are not known. We developed a collaborative AR framework with freely moving agents having no knowledge of their initial relative positions. Each agent in our framework uses a camera as the only input device for its SLAM process. Furthermore, the framework identifies map overlaps of agents using an appearance-based method. We also proposed a quality measure to determine the best keypoint detector/descriptor combination for our framework.

Introduction

Markerless tracking has been a goal of many augmented reality applications, and the Simultaneous Localization and Mapping (SLAM) has been a robust framework to accomplish this. The robotics community defines the SLAM problem as an agent creating a map of an unknown environment using sensors while localizing itself in it. To localize the agent properly, an accurate map is required. To generate an accurate map, localization has to be done properly. This means that localization and mapping need to be done simultaneously to benefit each other.

Inexpensive, ubiquitous mobile agents with cameras and image processing tools made them a popular choice of a sensor for SLAM. Most Visual SLAM approaches relied on detecting features and generating sparse maps using them. More recent solutions with direct featureless methods [1] generate semi-dense maps of the environment. Dense maps provide many benefits over sparse maps including, better agent interactions with the environment or objects, better scene interaction for augmented reality applications, and better object recognition with enhanced data. However, in practice, direct, featureless methods require significant overlaps between key frames, with narrower baselines. This adds a limit to the movement of the camera. Furthermore, the direct method alone could not handle large loop closures.

Many researchers investigated how to use multiple agents to perform SLAM: called collaborative or distributed SLAM. Distributed SLAM increases the robustness of the SLAM process and makes it less vulnerable to catastrophic failures. Challenges in distributed SLAM are computing map overlaps and sharing information between agents with limited communication bandwidth.

We developed a collaborative augmented reality framework based on distributed SLAM. Agents in our framework do not have any prior knowledge of their relative positions. Each agent generates a local semi-dense map utilizing direct featureless SLAM approach. The framework uses image features in keyframes to determine map overlaps between agents. We performed a comprehensive analysis on state-of-the-art keypoint detector/descriptor combinations to improve the performance of our system reported in [2] by defining a quality measure to find the optimal combination. We created the publicly available DIST-Mono distributed monocular visual SLAM dataset to evaluate our system. Furthermore we developed a proof-of-concept augmented reality application to demonstrate the potential of our framework.

Section snippets

Related work

In a seminal paper, Smith et al. [3] introduced an Extended Kalman Filter (EKF) based solution for the SLAM problem (EKF-SLAM). The EKF incrementally estimates the posterior distribution over agent pose and landmark positions. The covariance matrix grows with the number of landmarks. Even a single landmark observation leads to an update of the covariance matrix, limiting the number of landmarks EKF-SLAM could handle due to the excessive computational cost. Furthermore, EKF-SLAM has Gaussian

System overview

Our framework consists of two types of distributed nodes; exploring node and monitoring node. These nodes are deployed on different physical machines and given a globally unique identifier. The framework has one monitoring node and multiple exploring nodes at any given time. The nodes use communication channels to pass messages between each other.

We use the Robot Operating System (ROS) [21] infrastructure for our framework. ROS includes nodes that are responsible for performing computations. We

Exploring node

Each exploring node performs semi-dense visual SLAM based on the work by [22]. It uses a single camera as the only input device. It maintains a list of key frames and a pose graph to represent its local map.

Monitoring node

Exploring nodes of our distributed framework do not know their relative poses at the beginning. Monitoring Node’s Map overlap detection module is responsible for detecting and computing corresponding relative pose between nodes. It also detects loop closure of each exploring node.

Monitoring node maintains an N number of key frame databases DBi. Here N equals to the number of exploring nodes in the framework. All incoming key frames Ki, are matched against all these key frame databases. The

Determining overlap between two maps

Fig. 8 is a flowchart that describes how overlap between two maps are determined. As discussed earlier, the maps used in our framework is represented using a set of keyframes and a pose graph. The ith keyframe, Ki consists of an absolute pose ξWi, an image Ii, an inverse depth map Di, an inverse depth variance map Vi and a list of features Fi. Each feature in Fi is filtered for its Vi(xp) to determine its saliency, where xp is the location of the feature.

The pth feature in Ki should satisfy, Vi(

Public datasets

To evaluate our system, we need a monocular visual SLAM dataset, with multiple trajectories covering a single scene. We considered publicly available datasets, and they did not satisfy our requirements. For example, the dataset EuRoC [34] contains pure rotations which did not work well with the monocular SLAM approach we used. The Kitti [35] is mainly a stereo dataset, even when we considered a single camera, the direct monocular SLAM process failed since the camera motion is along the optical

System implementation

We developed exploring nodes and monitoring nodes as ROS nodes. We used ROS Indigo Igloo infrastructure on Ubuntu 14.04 LTS (Trusty) operating system. Both framework implementation and the comprehensive analysis on state-of-the-art feature detector and descriptor combinations we used version 2.4.8 of the OpenCV library.

Nodes in the framework communicate with each other using ROS topics. We used ROS statistics to measure bandwidth utilization in those communication channels. In addition to that

AR application

We added an AR window to each exploring node to test our framework. The AR window, allows users to add a virtual object (a simple cube, in our example) into its map. This allows us to demonstrate the collaborative AR potential of the distributed SLAM framework. Each exploring node has its local map so that it can render the augmented scene from its viewpoint. It also knows its pose on the global map. This allows it to render objects added by the other exploring nodes as well. Furthermore,

Conclusion

In this paper, we introduced a distributed SLAM framework that identifies map overlaps based on an appearance-based method. For the appearance based method, we have done a comprehensive analysis on the state-of-the-art keypoint detector and descriptors and introduced a quality measure to select the best combination for a distribution visual SLAM framework. The framework operates with no prior knowledge of relative starting poses of its nodes. Using an AR application we have shown that our

References (38)

  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int J Comput Vision

    (2004)
  • A. Alahi et al.

    FREAK: Fast retina keypoint

    Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (CVPR)

    (2012)
  • J. Engel et al.

    LSD-SLAM: large-scale direct monocular SLAM

    Computer vision ECCV 2014. Lecture Notes in Computer Science

    (2014)
  • R. Egodagamage et al.

    A collaborative augmented reality framework based on distributed visual SLAM

    Proceedings of the international conference on cyberworlds (CW)

    (2017)
  • R. Smith et al.

    Estimating Uncertain Spatial Relationships in Robotics

  • M. Montemerlo et al.

    FastSLAM: A factored solution to the simultaneous localization and mapping problem

    Proceedings of the AAAI national conference on artificial intelligence

    (2002)
  • A. Davison et al.

    MonoSLAM: real-time single camera SLAM

    IEEE Trans Pattern Anal Mach Intell

    (2007)
  • G. Klein et al.

    Parallel tracking and mapping for small AR workspaces

    Proceedings of the 2007 6th IEEE and ACM international symposium on mixed and augmented reality

    (2007)
  • M.A. Fischler et al.

    Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography

    Commun ACM

    (1981)
  • R.I. Hartley et al.

    Multiple View Geometry in Computer Vision

    (2004)
  • B. Triggs et al.

    Bundle adjustmenta modern synthesis

    Vision algorithms: theory and practice

    (2000)
  • H. Strasdat et al.

    Real-time monocular SLAM: Why filter?

    Proceedings of the IEEE international conference on robotics and automation (ICRA)

    (2010)
  • R.A. Newcombe et al.

    DTAM: Dense tracking and mapping in real-time

    Proceedings of the 2011 IEEE international conference on computer vision (ICCV)

    (2011)
  • E. Nettleton et al.

    Decentralised SLAM with low-bandwidth communication for teams of vehicles

    Field and Service Robotics

    (2006)
  • L. Paull et al.

    Communication-constrained multi-auv cooperative SLAM

    Proceedings of the 2015 IEEE international conference on robotics and automation (ICRA)

    (2015)
  • A. Howard et al.

    The SDR experience: experiments with a large-scale heterogeneous mobile robot team

  • D. Fox et al.

    Distributed multirobot exploration and mapping

    Proc IEEE

    (2006)
  • H. Bay et al.

    Speeded-up robust features (surf)

    Comput Vis Image Underst

    (2008)
  • E. Rublee et al.

    ORB: An efficient alternative to sift or surf

    Proceedings of the 2011 IEEE international conference on Computer Vision (ICCV)

    (2011)
  • Cited by (0)

    View full text