Pre-Processing Question #591
Replies: 4 comments
-
With pre-processing, there's always a space/time tradeoff. For example, if you always pre-process all of your files, you'll end up doubling how much storage space you use, but loading the data will be faster. If you have the space to spare, I would personally pre-process everything to improve loading speed. If space is tight, all of these pre-processing steps can be done within TorchGeo.
For "cropping the raster data to a region of interest", this can be done by specifying an |
Beta Was this translation helpful? Give feedback.
-
Thanks for your answer @adamjstewart. In torchGeo, Is there a way to separate a single raster input, that contains integer class labels, into one-hot encoded layers? Or is this something that we would have to do as a pre-processing step? |
Beta Was this translation helpful? Give feedback.
-
I think you want to use |
Beta Was this translation helpful? Give feedback.
-
It sounds like all of your questions have been answered, so I'll close this. Let me know if you have any other questions! |
Beta Was this translation helpful? Give feedback.
-
Hi! I'm really enjoying using torchGeo for a team research project, however, I have a design/structure question that I'm someone might be able to help us with? The question is regarding preprocessing of input raster files to a CNN (which we are using torchGeo to implement).
The current data pipeline is as follows:
What we think would be preferable would be to integrate the two stages into one, such that the pre-processing steps can be completed in torchGeo by simply calling methods for custom classes we have created for those data types. However, the main issue we have encountered, is that indexing layers within data-stacks is difficult (unless you use bounding boxes), which I understand to be a design choice that is potentially required when using large amounts of data (we will be).
Essentially my question is this: Do you recommend separating pre-processing functionality from torchGeo functionality or not? If you recommend integrating it, how is it best to do this (i.e. complete processing on a whole data-stack or only on a sample once taken etc.)?
Any help/advice would be greatly appreciated, thanks!
Beta Was this translation helpful? Give feedback.
All reactions