skip to main content
10.1145/3408127.3408178acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdspConference Proceedingsconference-collections
research-article

VESS: Variable Event Stream Structure for Event-based Instance Segmentation Benchmark

Published:10 September 2020Publication History

ABSTRACT

Comparing with traditional frame-based camera, event camera (also known as dynamic vision sensor) has received increasing attention due to various outstanding advantages. Inspired by biology, the camera naturally captures the dynamics of a scene with low latency, filtering out redundant information with low power consumption. Deep learning based instance segmentation, which are influential research in visual recognition tasks, could potentially take advantage of the benefits of event camera, but the event based application combined with deep learning still faces some challenges. In this work, we propose to develop event-based instance segmentation that unlocks the potential of the event data by combining event camera and deep learning. To make the best out of the event data, we propose a novel event representation method - variable event stream structure (VESS) for event-based instance segmentation. However, event-based datasets are rare, and none of them contains instance segmentation labels, we produce the accurate label specialized for instance segmentation on event camera. The proposed method before is verified on the dataset, and our work can reach an average Intersection over Union (IOU) of 55.75% in real-time and work properly under challenging environment like motion blur and extreme lighting condition.

References

  1. P. Lichtsteiner, C. Posch, and T. Delbruck. 2008. A 128x128 120 db 15μs latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566--576.Google ScholarGoogle Scholar
  2. A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcıa, and D. Scaramuzza. 2018. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5419--5427.Google ScholarGoogle Scholar
  3. Kim, H., Leutenegger, S., & Davison, A. J. (2016, October). 2018. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In European Conference on Computer Vision (pp. 349-364). Springer, Cham.Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza. Semi-dense 3d reconstruction with a stereo event camera. ECCV.Google ScholarGoogle Scholar
  4. H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza. 2017. Emvs: Event-based multi-view stereo3d reconstruction with an event camera in real-time. International Journal of Com- puter Vision, pages 1--21.Google ScholarGoogle Scholar
  5. H. Rebecq, T. Horstschaefer, G. Gallego, and D. Scara-muzza. 2017. Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2):593--600.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Liu and T. Delbruck. 2018. Adaptive time-slice block-matching optical flow algorithm for dynamic vision sensors. Technical report.Google ScholarGoogle Scholar
  7. Hu, X., Wang, G., Zhang, Y., Yang, H., & Zhang, S. 2019. Large depth-of-field 3D shape measurement using an electrically tunable lens. Optics express, 27(21), 29697--29709.Google ScholarGoogle Scholar
  8. Yang, S., Wang, J., Wang, G., Hu, X., Zhou, M., & Liao, Q. 2017. Robust RGB-D SLAM in dynamic environment using faster R-CNN. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC) (pp. 2398-2402). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  9. Xiao, Y., Wang, G., Hu, X., Shi, C., & Meng, L. 2019. Guided, Fusion-Based, Large Depth-of-field 3D Imaging Using a Focal Stack. Sensors 2019, 19(22), 4845.Google ScholarGoogle Scholar
  10. Chen, N. F. 2018. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 644-653).Google ScholarGoogle ScholarCross RefCross Ref
  11. Chen, G., Cao, H., Ye, C., Zhang, Z., Liu, X., Mo, X., & Knoll, A. C. 2019. Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors. Frontiers in neurorobotics, 13, 10.Google ScholarGoogle Scholar
  12. Alonso, I., & Murillo, A. C. 2019. EV-SegNet: Semantic Segmentation for Event-based Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).Google ScholarGoogle ScholarCross RefCross Ref
  13. K. He, G. Gkioxari, P. Doll'ar, and R. Girshick. Mask R-CNN. 2017. In IEEE Int. Conf. on Computer Vision, pages 2980--2988.Google ScholarGoogle Scholar
  14. C. Dechesne, C. Mallet, A. Le Bris, and V. Gouet-Brunet. 2017. Semantic segmentation of forest stands of pure species com-bining airborne lidar data and very high resolution multispec-tral imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 126:129--145.Google ScholarGoogle ScholarCross RefCross Ref
  15. X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman. 2017. Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE transactions on pattern analysis and machine intelligence, 39(7):1346--1359.Google ScholarGoogle Scholar
  16. A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman. 2018. Hats: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1731--1740.Google ScholarGoogle Scholar
  17. A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis. 2019. Evflownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898.Google ScholarGoogle Scholar
  18. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proc. of IEEE conf. on CVPR, pages 3213--3223.Google ScholarGoogle Scholar
  19. D. Bolya, C. Zhou, F. Xiao, Y. J. Lee. 2019. YOLACT: Real-time Instance Segmentation. ICCV.Google ScholarGoogle Scholar

Index Terms

  1. VESS: Variable Event Stream Structure for Event-based Instance Segmentation Benchmark

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICDSP '20: Proceedings of the 2020 4th International Conference on Digital Signal Processing
      June 2020
      383 pages
      ISBN:9781450376877
      DOI:10.1145/3408127

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 September 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader