ABSTRACT
Comparing with traditional frame-based camera, event camera (also known as dynamic vision sensor) has received increasing attention due to various outstanding advantages. Inspired by biology, the camera naturally captures the dynamics of a scene with low latency, filtering out redundant information with low power consumption. Deep learning based instance segmentation, which are influential research in visual recognition tasks, could potentially take advantage of the benefits of event camera, but the event based application combined with deep learning still faces some challenges. In this work, we propose to develop event-based instance segmentation that unlocks the potential of the event data by combining event camera and deep learning. To make the best out of the event data, we propose a novel event representation method - variable event stream structure (VESS) for event-based instance segmentation. However, event-based datasets are rare, and none of them contains instance segmentation labels, we produce the accurate label specialized for instance segmentation on event camera. The proposed method before is verified on the dataset, and our work can reach an average Intersection over Union (IOU) of 55.75% in real-time and work properly under challenging environment like motion blur and extreme lighting condition.
- P. Lichtsteiner, C. Posch, and T. Delbruck. 2008. A 128x128 120 db 15μs latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566--576.Google Scholar
- A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcıa, and D. Scaramuzza. 2018. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5419--5427.Google Scholar
- Kim, H., Leutenegger, S., & Davison, A. J. (2016, October). 2018. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In European Conference on Computer Vision (pp. 349-364). Springer, Cham.Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza. Semi-dense 3d reconstruction with a stereo event camera. ECCV.Google Scholar
- H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza. 2017. Emvs: Event-based multi-view stereo3d reconstruction with an event camera in real-time. International Journal of Com- puter Vision, pages 1--21.Google Scholar
- H. Rebecq, T. Horstschaefer, G. Gallego, and D. Scara-muzza. 2017. Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2):593--600.Google ScholarCross Ref
- M. Liu and T. Delbruck. 2018. Adaptive time-slice block-matching optical flow algorithm for dynamic vision sensors. Technical report.Google Scholar
- Hu, X., Wang, G., Zhang, Y., Yang, H., & Zhang, S. 2019. Large depth-of-field 3D shape measurement using an electrically tunable lens. Optics express, 27(21), 29697--29709.Google Scholar
- Yang, S., Wang, J., Wang, G., Hu, X., Zhou, M., & Liao, Q. 2017. Robust RGB-D SLAM in dynamic environment using faster R-CNN. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC) (pp. 2398-2402). IEEE.Google ScholarCross Ref
- Xiao, Y., Wang, G., Hu, X., Shi, C., & Meng, L. 2019. Guided, Fusion-Based, Large Depth-of-field 3D Imaging Using a Focal Stack. Sensors 2019, 19(22), 4845.Google Scholar
- Chen, N. F. 2018. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 644-653).Google ScholarCross Ref
- Chen, G., Cao, H., Ye, C., Zhang, Z., Liu, X., Mo, X., & Knoll, A. C. 2019. Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors. Frontiers in neurorobotics, 13, 10.Google Scholar
- Alonso, I., & Murillo, A. C. 2019. EV-SegNet: Semantic Segmentation for Event-based Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).Google ScholarCross Ref
- K. He, G. Gkioxari, P. Doll'ar, and R. Girshick. Mask R-CNN. 2017. In IEEE Int. Conf. on Computer Vision, pages 2980--2988.Google Scholar
- C. Dechesne, C. Mallet, A. Le Bris, and V. Gouet-Brunet. 2017. Semantic segmentation of forest stands of pure species com-bining airborne lidar data and very high resolution multispec-tral imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 126:129--145.Google ScholarCross Ref
- X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman. 2017. Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE transactions on pattern analysis and machine intelligence, 39(7):1346--1359.Google Scholar
- A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman. 2018. Hats: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1731--1740.Google Scholar
- A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis. 2019. Evflownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898.Google Scholar
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proc. of IEEE conf. on CVPR, pages 3213--3223.Google Scholar
- D. Bolya, C. Zhou, F. Xiao, Y. J. Lee. 2019. YOLACT: Real-time Instance Segmentation. ICCV.Google Scholar
Index Terms
- VESS: Variable Event Stream Structure for Event-based Instance Segmentation Benchmark
Recommendations
Standard and Event Cameras Fusion for Feature Tracking
ICMVA '21: Proceedings of the 2021 International Conference on Machine Vision and ApplicationsStandard cameras are frame-based sensors that capture the scene at a fixed rate. They cannot provide information between two frames and suffer from the motion blur problem in high-speed robotic and vision applications. By contrast, event-based cameras ...
Event Camera Survey and Extension Application to Semantic Segmentation
IPMV '22: Proceedings of the 4th International Conference on Image Processing and Machine VisionEvent cameras are a kind of radically novel vision sensors. Unlike traditional standard cameras which acquire full images at a fixed rate, event cameras capture brightness changes for each pixel asynchronously. As a result, the output of event camera is ...
How to Learn a Domain-Adaptive Event Simulator?
MM '21: Proceedings of the 29th ACM International Conference on MultimediaThe low-latency streams captured by event cameras have shown impressive potential in addressing vision tasks such as video reconstruction and optical flow estimation. However, these tasks often require massive training event streams, which are expensive ...
Comments