Monocular depth estimation through virtual-world supervision and real-world SfM self-supervision

Gurram, Akhil; Tuna, Ahmet Faruk; Shen, Fengyi; Urfalioglu, Onay; López Peña, Antonio M.

doi:10.1109/TITS.2021.3117059

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/275235

Web of Science: 7 citas, Scopus: 10 citas, Google Scholar: citas

Monocular depth estimation through virtual-world supervision and real-world SfM self-supervision
Gurram, Akhil

(Huawei Munich Research Center)
Tuna, Ahmet Faruk

(Huawei Munich Research Center)
Shen, Fengyi (Technische Universität München. Department of Informatics)
Urfalioglu, Onay (Huawei Munich Research Center)
López Peña, Antonio M.

(Universitat Autònoma de Barcelona. Departament de Ciències de la Computació)

Fecha:	2022
Resumen:	Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i. e. , assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular d epth e stimation by v irtual-world s upervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision, and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
Ayudas:	Agencia Estatal de Investigación TIN2017-88709-R
Nota:	Antonio acknowledges the financial support to his general research activities given by ICREA under the ICREA Academia Program. Antonio acknowledges the support of the Generalitat de Catalunya CERCA Program as well as its ACCIO agency to CVC's general activities
Derechos:	Tots els drets reservats.
Lengua:	Anglès
Documento:	Article ; recerca ; Versió acceptada per publicar
Materia:	Training ; Estimation ; Semantics ; Cameras ; Laser radar ; Optical imaging ; Sensors
Publicado en:	IEEE Transactions on Intelligent Transportation Systems, Vol. 23, issue 8 (Aug. 2022) , p. 12738-12751, ISSN 1558-0016

DOI: 10.1109/TITS.2021.3117059

Disponible a partir de: 2024-08-30
Postprint

El registro aparece en las colecciones:
Artículos > Artículos de investigación
Artículos > Artículos publicados

Registro creado el 2023-05-26, última modificación el 2023-05-31

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4