Pre-Processing Question #591

Hamish-Cam · 2022-01-27T17:38:11Z

Hamish-Cam
Jan 27, 2022

Hi! I'm really enjoying using torchGeo for a team research project, however, I have a design/structure question that I'm someone might be able to help us with? The question is regarding preprocessing of input raster files to a CNN (which we are using torchGeo to implement).

The current data pipeline is as follows:

We run a pre-processing script to do various things to our input datasets e.g. cropping the raster data to a region of interest (a hexagonal region to be precise), changing scalar values to binary values etc.
Once pre-processing is complete, we save these processed data files to post-processed raster files.
These post-processed raster files are then pointed to in torchGeo and all the great ML functionality can take place using the torchGeo framework.

What we think would be preferable would be to integrate the two stages into one, such that the pre-processing steps can be completed in torchGeo by simply calling methods for custom classes we have created for those data types. However, the main issue we have encountered, is that indexing layers within data-stacks is difficult (unless you use bounding boxes), which I understand to be a design choice that is potentially required when using large amounts of data (we will be).

Essentially my question is this: Do you recommend separating pre-processing functionality from torchGeo functionality or not? If you recommend integrating it, how is it best to do this (i.e. complete processing on a whole data-stack or only on a sample once taken etc.)?

Any help/advice would be greatly appreciated, thanks!

adamjstewart · 2022-01-27T20:12:27Z

adamjstewart
Jan 27, 2022
Maintainer

Do you recommend separating pre-processing functionality from torchGeo functionality or not?

With pre-processing, there's always a space/time tradeoff. For example, if you always pre-process all of your files, you'll end up doubling how much storage space you use, but loading the data will be faster. If you have the space to spare, I would personally pre-process everything to improve loading speed. If space is tight, all of these pre-processing steps can be done within TorchGeo.

If you recommend integrating it, how is it best to do this (i.e. complete processing on a whole data-stack or only on a sample once taken etc.)?

For "cropping the raster data to a region of interest", this can be done by specifying an roi bounding box to the sampler. Unfortunately we don't support hexagonal bounding boxes. For "changing scalar values to binary values etc.", I would write a custom transform that acts on a single sample (transforms argument of dataset) or on a batch of samples (similar to how Kornia works).

0 replies

Hamish-Cam · 2022-01-30T19:32:27Z

Hamish-Cam
Jan 30, 2022
Author

Thanks for your answer @adamjstewart.

In torchGeo, Is there a way to separate a single raster input, that contains integer class labels, into one-hot encoded layers? Or is this something that we would have to do as a pre-processing step?

0 replies

adamjstewart · 2022-01-30T19:53:25Z

adamjstewart
Jan 30, 2022
Maintainer

I think you want to use torch.nn.functional.one_hot. TorchGeo only includes transforms that are specific to working with geospatial data, not generic transforms needed by all PyTorch users.

0 replies

adamjstewart · 2022-02-09T21:05:34Z

adamjstewart
Feb 9, 2022
Maintainer

It sounds like all of your questions have been answered, so I'll close this. Let me know if you have any other questions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Processing Question #591

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pre-Processing Question #591

Hamish-Cam Jan 27, 2022

Replies: 4 comments

adamjstewart Jan 27, 2022 Maintainer

Hamish-Cam Jan 30, 2022 Author

adamjstewart Jan 30, 2022 Maintainer

adamjstewart Feb 9, 2022 Maintainer

Hamish-Cam
Jan 27, 2022

adamjstewart
Jan 27, 2022
Maintainer

Hamish-Cam
Jan 30, 2022
Author

adamjstewart
Jan 30, 2022
Maintainer

adamjstewart
Feb 9, 2022
Maintainer