Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md benchmark dataset code #2069

Open
douglasmacdonald opened this issue May 19, 2024 · 17 comments
Open

README.md benchmark dataset code #2069

douglasmacdonald opened this issue May 19, 2024 · 17 comments
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation good first issue A good issue for a new contributor to work on
Milestone

Comments

@douglasmacdonald
Copy link

Issue

I need help getting the example code on the README.md to work. I am now concentrating on the Benchmark datasets (https://github.com/microsoft/torchgeo?tab=readme-ov-file#benchmark-datasets).

I am running on the Planetary Computer platform.

I did not have any luck with the platform's default torchgeo and so run

!pip install torchgeo --upgrade

And this gives me version '0.5.2'.

However, I am still having problems....

dataset = VHR10('data', download=True, checksum=True)

RuntimeError: The MD5 checksum of the download file data[/NWPU](https://pccompute.westeurope.cloudapp.azure.com/NWPU) VHR-10 dataset.rar does not match the one on record.

from torchgeo.datamodules.utils import collate_fn_detection

ImportError: cannot import name 'collate_fn_detection' from 'torchgeo.datamodules.utils' ([/srv/conda/envs/notebook/lib/python3.11/site-packages/torchgeo/datamodules/utils.py](https://pccompute.westeurope.cloudapp.azure.com/srv/conda/envs/notebook/lib/python3.11/site-packages/torchgeo/datamodules/utils.py))

Fix

I assume version problems.

@douglasmacdonald douglasmacdonald added the documentation Improvements or additions to documentation label May 19, 2024
@adamjstewart
Copy link
Collaborator

Hi @douglasmacdonald, sorry you ran into these issues!

I did not have any luck with the platform's default torchgeo

@calebrob6 do we have any contacts we can use to upgrade the default torchgeo version on PC?

RuntimeError: The MD5 checksum of the download file data/NWPU VHR-10 dataset.rar does not match the one on record.

I am not able to reproduce this. What version of torchvision are you using? TorchGeo uses torchvision download utils, and torchvision 0.17.1+ switched from requests to gdown for all Google Drive downloads. It may resolve the issue if you delete the file, upgrade to torchvision 0.17.1+, and install gdown.

ImportError: cannot import name 'collate_fn_detection' from 'torchgeo.datamodules.utils'

This is indeed a version issue. The feature you are trying to use was added in #1082 and will be included in the 0.6.0 release.

My personal recommendation would be to pick a different dataset, VHR-10 is actually one of the more complicated ones. If you're completely new to PyTorch, you're actually better off starting with a torchvision tutorial. All TorchGeo NonGeoDatasets are designed to be functionally identical to torchvision datasets. So if you know how to use torchvision, you know how to use torchgeo. If you still want to use VHR-10, either install the development version (0.6.0.dev0) or wait for the 0.6.0 release (maybe in 1 month?).

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label May 19, 2024
@douglasmacdonald
Copy link
Author

Hello,

One moment!

Could it have anything to do with using:

pip install torchgeo==0.5.2

Where I maybe should be using

pip install torchgeo[all]==0.5.2

?

Best,
Douglas

@adamjstewart
Copy link
Collaborator

adamjstewart commented May 19, 2024

VHR-10 requires 3 optional dependencies:

  1. gdown: to download the dataset if using torchvision 0.17.1+
  2. rarfile: to extract the .rar file
  3. pycocotools: to read the labels for the 'positive' set

Running pip install torchgeo will not install any of these. Running pip install torchgeo[datasets] will install 2 and 3. Torchvision does not currently automatically install 1 for you, so you have to install it yourself.

We may want to add 1 to [datasets]. If you want to submit a PR to do this, I would be happy to review it.

EDIT: I opened pytorch/vision#8430 to help better document this. With this, we could use torchvision in our required deps and torchvision[gdown] in our optional deps.

@adamjstewart adamjstewart changed the title README.md benchman dataset code README.md benchmark dataset code May 19, 2024
@isaaccorley
Copy link
Collaborator

Should we also change the example to use a different dataset like InriaAIL or EuroSAT?

@adamjstewart
Copy link
Collaborator

The example is fixed (it should work now on main), but I would be happy to change to a different dataset too. I only used that example because VHR-10 was the first dataset I wrote and it had a cool prediction plot.

@robmarkcole
Copy link
Contributor

robmarkcole commented May 24, 2024

I've just attempted with torchgeo.__version__ == '0.6.0.dev0' and get a different error:

from torch.utils.data import DataLoader

from torchgeo.datamodules.utils import collate_fn_detection
from torchgeo.datasets import VHR10

# Initialize the dataset
dataset = VHR10(download=True)

---------------------------------------------------------------------------
NotRarFile                                Traceback (most recent call last)
Cell In[2], [line 7](vscode-notebook-cell:?execution_count=2&line=7)
      [4](vscode-notebook-cell:?execution_count=2&line=4) from torchgeo.datasets import VHR10
      [6](vscode-notebook-cell:?execution_count=2&line=6) # Initialize the dataset
----> [7](vscode-notebook-cell:?execution_count=2&line=7) dataset = VHR10(download=True)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218), in VHR10.__init__(self, root, split, transforms, download, checksum)
    [215](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:215) self.checksum = checksum
    [217](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:217) if download:
--> [218](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218)     self._download()
    [220](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:220) if not self._check_integrity():
    [221](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:221)     raise DatasetNotFoundError(self)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343), in VHR10._download(self)
    [340](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:340)     return
    [342](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:342) # Download images
--> [343](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343) download_and_extract_archive(
    [344](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:344)     self.image_meta['url'],
    [345](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:345)     self.root,
    [346](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:346)     filename=self.image_meta['filename'],
    [347](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:347)     md5=self.image_meta['md5'] if self.checksum else None,
    [348](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:348) )
    [350](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:350) # Annotations only needed for "positive" image set
    [351](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:351) if self.split == 'positive':
    [352](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:352)     # Download annotations

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145), in download_and_extract_archive(url, download_root, extract_root, filename, md5)
    [143](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:143) archive = os.path.join(download_root, filename)
    [144](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:144) print(f'Extracting {archive} to {extract_root}')
--> [145](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145) extract_archive(archive, extract_root)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99), in extract_archive(src, dst)
     [97](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:97) for suffix, extractor in suffix_and_extractor:
     [98](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:98)     if src.endswith(suffix):
---> [99](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99)         with extractor(src, 'r') as f:
    [100](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:100)             f.extractall(dst)
    [101](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:101)         return

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48), in _rarfile.RarFile.__enter__(self)
     [45](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:45) rarfile = lazy_import('rarfile')
     [46](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:46) # TODO: catch exception for when rarfile is installed but not
     [47](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:47) # unrar/unar/bsdtar
---> [48](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48) return rarfile.RarFile(*self.args, **self.kwargs)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711), in RarFile.__init__(self, file, mode, charset, info_callback, crc_check, errors, part_only)
    [708](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:708) if mode != "r":
    [709](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:709)     raise NotImplementedError("RarFile supports only mode=r")
--> [711](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711) self._parse()

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930), in RarFile._parse(self)
    [928](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:928)     self._file_parser = p5  # noqa
    [929](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:929) else:
--> [930](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930)     raise NotRarFile("Not a RAR file")
    [932](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:932) self._file_parser.parse()
    [933](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:933) self.comment = self._file_parser.comment

NotRarFile: Not a RAR file

I get the same error from the command line:

⚡ ~/data unrar e NWPU\ VHR-10\ dataset.rar 

UNRAR 5.61 beta 1 freeware      Copyright (c) 1993-2018 Alexander Roshal

NWPU VHR-10 dataset.rar is not RAR archive
No files to extract

OK it appears the file downloaded by torchgeo was somehow corrupted - if I manually download via the browser there is no issue.

However I then get Dataset not found in root='data', resolved by commenting out self._check_integrity.

Appears to be because there is a check: Checking integrity of data/NWPU VHR-10 dataset.rar which I deleted.

I just comment out that check and the dataset loads fine, however the image is not contrast stretched:
image

That is resolved using percentile_normalization and I am now ready to go:
image

@isaaccorley
Copy link
Collaborator

@robmarkcole what version of torchvision are you using? You can turn off the integrity check by passing checksum=False to the dataset/datamodule.

@robmarkcole
Copy link
Contributor

robmarkcole commented May 24, 2024

'0.6.0.dev0' via pip install git+https://github.com/microsoft/torchgeo.git@main#egg=torchgeo

Can confirm no issues using checksum=False so long as the data is downloaded without corruption and the rar is present.

There appears to be another issue, which I believe is due to collate_fn_detection not being applied - on using the dataset I get error:

[252](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:252) """Compute the validation metrics.
    [253](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:253) 
    [254](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:254) Args:
   (...)
    [257](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:257)     dataloader_idx: Index of the current dataloader.
    [258](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:258) """
    [259](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:259) x = batch['image']
--> [260](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:260) batch_size = x.shape[0]
    [261](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:261) y = [
    [262](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:262)     {'boxes': batch['boxes'][i], 'labels': batch['labels'][i]}
    [263](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:263)     for i in range(batch_size)
    [264](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:264) ]
    [265](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:265) y_hat = self(x)

AttributeError: 'list' object has no attribute 'shape'

Usage:

class VHR10DataModule(L.LightningDataModule):
    def __init__(self, data_dir: str = "", batch_size: int = 4, num_workers: int = 0,):
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.num_workers = num_workers

    def setup(self, stage: str):
        return 

    def train_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

    def val_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

    def test_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

datamodule = VHR10DataModule(data_dir="data", batch_size=4, num_workers=0)
datamodule.setup("fit")

@isaaccorley
Copy link
Collaborator

The collate fn is being applied but the trainer doesn't accept a list of images but expects it to be a tensor only which definitely is a bug when each image is a different size in the VHR-10 dataset they can't be stacked properly.

@robmarkcole
Copy link
Contributor

In this batch, the images all had different shapes - presume I just need to add a cropping augmentation?

torch.Size([3, 808, 958])
torch.Size([3, 806, 950])
torch.Size([3, 803, 889])
torch.Size([3, 732, 946])

@isaaccorley
Copy link
Collaborator

Yep that's correct. I know in the past that Kornia had some bugs with the augmentations not properly being applied to the boxes but that appears to have been fixed.

@robmarkcole
Copy link
Contributor

robmarkcole commented May 24, 2024

OK just noticed from torchgeo.datamodules import VHR10DataModule which handles this :-)

So in summary:

  1. The rar file being corrupted was the cause of my issue
  2. The plottling is off without percentile_normalization

@isaaccorley
Copy link
Collaborator

Oof I thought you were already using it or I would have suggested the datamodule. I'll take a look at fixes for this. Thanks for being an A+ test engineer!

@adamjstewart
Copy link
Collaborator

Trying to catch up on this thread...

@calebrob6 do we have any contacts we can use to upgrade the default torchgeo version on PC?

Well, that didn't age well. Looks like PC will be shutting down, so no need to worry about this anymore.

The rar file being corrupted was the cause of my issue

If we're seeing intermittent issues with GDrive, we could rehost the dataset on HF. It appears to be released under an MIT license.

The plottling is off without percentile_normalization

I'm happy to submit a PR to fix this, but then no one will review it... Does anyone else want to submit a PR?

@ashnair1 was the last person to touch this dataset.

@adamjstewart adamjstewart added this to the 0.5.3 milestone May 25, 2024
@calebrob6
Copy link
Member

PC hub (the free compute) is shutting down, the 50+ PB of data hosting and APIs that let you index into it, explorer for visualizing it, and catalog are all unchanged AFAIK

@adamjstewart adamjstewart added the good first issue A good issue for a new contributor to work on label May 25, 2024
@ashnair1
Copy link
Collaborator

Good catch regarding normalization. By default the images are uint8 and are loaded as floats. During training the images are normalized (in the datamodule) to a range of 0-1 before plotting which is why the training plots look normal. However while plotting samples via the method directly, the tensor has values that range from 0-255 and is in float dtype making the plot incorrect.

@adamjstewart
Copy link
Collaborator

@burakekim is going to inquire about redistributing VHR-10 on Hugging Face, which will allow us to get rid of the Google Drive issues and remove dependencies on rarfile and gdown. Hopefully that will solve some of the issues you encountered!

I think we should also replace the README example with a simpler dataset like EuroSAT, which will finally close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation good first issue A good issue for a new contributor to work on
Projects
None yet
Development

No branches or pull requests

6 participants