Skip to content
Isabelle Eysseric edited this page Nov 27, 2023 · 32 revisions

Semantic segmentation of panoramic images Instance segmentation, Depth & Position Estimation

Computer-Vision-Research-Project (GitHub)   •  Computer-Vision-Research-Project (Wiki)   •  LVSN (Research Laboratory)   •  Institut intelligence et données (IID) (research Institute)   •  Sentinelle Nord (Project)   •  J-F Lalonde Group (Team)   •  Algorithmic Photography (Course)

isabelleysseric (GitHub)   •  isabelleysseric.com (Portfolio)   •  isabelleysseric (LinkedIn)



Author: Isabelle Eysseric

Introduction

The graduate student's research work consisted of capturing as much information as possible while taking panoramic photos in order to train a model to detect the missing information.



Literature review

Several scientific articles and blog articles have been read in the field of computer vision and more specifically on the semantic segmentation task in order to better understand the expectations of the project. To do this, it was necessary to learn about the types of semantic segmentation and the techniques used to achieve it. Semantic segmentation being a dense prediction task, i.e. the label of each pixel is predicted in the output image, the use of deep learning was therefore necessary. The architecture of U-shaped neural networks is commonly used for this kind of problem.



Figure – Types of segmentation.



Research algorithms

The different neural networks for image processing have been studied such as Convolutional Neural Networks (CNN) and its variants for segmentation, R-CNN, Fast R-CNN, Mask R-CNN and Mesh R-CNN but also the Deeplab and HoHoNet models.


As a result, two models were retained: Deeplab[23] which is currently the state of the art for semantic instance segmentation and the HoHoNet[24] model which also processes 360 degree panorama images. The strength of Google's Deeplab model is for its performance in segmentation on panoramic images while that for the HoHoNet model is in depth estimation in addition to semantic segmentation on high-resolution panoramic images.



Figure – Architecture of a network for semantic segmentation.



Collect data

To begin with, the COCO Detection[25] dataset was downloaded in order to test the Deeplab model and then possibly that of Matterport3D. To do this, it was necessary to connect to the laboratory server and locally download the data necessary for the segmentation task since the entire dataset was far too large (1 TB). The Stanford2D3D[26] dataset is the one used by the HoHoNet model. Eventually the panoramic data from the LVSN laboratory were downloaded in order to produce the results of segmentation and estimation of depth and position.



Figure – Dataset from Matterport3D and COCO Detection.



Testing algorithms

Afterwards, she tried to test the Matterport3D dataset but the generation of the masks for each image could not work. She encountered some problems, tried to solve them, to change strategy to finally realize that it was not possible to complete this task. The lack of support for Matterport3D and the obsolete language used (Lua[27]) by the application did not allow this task to be completed in time.


Not having much time to complete the mandate, it was decided to change the dataset and model in order to perform the tests on the laboratory data. The HoHoNet model with the Stanford2D3D dataset was selected. The model was tested on a sample of 50 panoramic images from the LVSN laboratory and then finally on the entire dataset, i.e. on 2280 panoramic images. Three results for each image were produced, semantic segmentation, depth estimation and position estimation.



Figure – DeeplabV3 models from Pytorch and HoHoNet.



Results

Finally it was possible to find, test and segment the LVSN research lab panorama dataset. These results will make it possible to be able to describe the content of the scenes in the dataset and to understand the impact of each element on the photopic and melanopic aspect of light and possibly create better living spaces for people who work and live in them. the big North. Ultimately, the calibrated dataset will allow learning to predict these measurements from a single image.



Figure – Original image with all three results.