End-to-End Self-Supervised Learning for Active Stereo Systems
Depth sensing is a classic problem with a long history of prior work. It’s at the heart of many tasks, from 3D reconstruction to localization and tracking. Its applications span otherwise disparate research and product areas, including indoor mapping and architecture, autonomous cars, and human body and face tracking.
With interest in virtual and augmented reality rising, depth estimation has recently taken center stage. Depth sensors are revolutionizing computer vision by providing additional 3D information for many hard problems.
Although there are many types of depth sensor technologies (shown in Fig.1 below), they all have significant limitations.
- Time of Flight (TOF) systems suffer from motion artifacts and multi-path interference.
- Structured light is vulnerable to ambient illumination and multi-device interference.
- Passive stereo struggles in texture-less regions, where expensive global optimization techniques are required — especially in traditional non-learning based methods.
An additional depth sensor type offers a potential solution. In active stereo, an infrared stereo camera pair is used, with a pseudorandom pattern projectively texturing the scene via a patterned IR light source (Fig. 3). With a proper selection of wavelength sensing, the camera pair captures a combination of active illumination and passive light, which improves on the quality of structured light while providing a robust solution in both indoor and outdoor scenarios.
