Monocular depth estimation through virtual-world supervision and real-world SfM self-supervision

Gurram, Akhil; Tuna, Ahmet Faruk; Shen, Fengyi; Urfalioglu, Onay; López Peña, Antonio M.

doi:10.1109/TITS.2021.3117059

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/275235

Web of Science: 7 citations, Scopus: 10 citations, Google Scholar: citations

Monocular depth estimation through virtual-world supervision and real-world SfM self-supervision
Gurram, Akhil

(Huawei Munich Research Center)
Tuna, Ahmet Faruk

(Huawei Munich Research Center)
Shen, Fengyi (Technische Universität München. Department of Informatics)
Urfalioglu, Onay (Huawei Munich Research Center)
López Peña, Antonio M.

(Universitat Autònoma de Barcelona. Departament de Ciències de la Computació)

Date:	2022
Abstract:	Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i. e. , assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular d epth e stimation by v irtual-world s upervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision, and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
Grants:	Agencia Estatal de Investigación TIN2017-88709-R
Note:	Antonio acknowledges the financial support to his general research activities given by ICREA under the ICREA Academia Program. Antonio acknowledges the support of the Generalitat de Catalunya CERCA Program as well as its ACCIO agency to CVC's general activities
Rights:	Tots els drets reservats.
Language:	Anglès
Document:	Article ; recerca ; Versió acceptada per publicar
Subject:	Training ; Estimation ; Semantics ; Cameras ; Laser radar ; Optical imaging ; Sensors
Published in:	IEEE Transactions on Intelligent Transportation Systems, Vol. 23, issue 8 (Aug. 2022) , p. 12738-12751, ISSN 1558-0016

DOI: 10.1109/TITS.2021.3117059

Available from: 2024-08-30
Postprint

The record appears in these collections:
Articles > Research articles
Articles > Published articles

Record created 2023-05-26, last modified 2023-05-31

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4