Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_steps_per_epoch fails if block_shape is None #278

Closed
hvgazula opened this issue Mar 4, 2024 · 4 comments · Fixed by #295
Closed

get_steps_per_epoch fails if block_shape is None #278

hvgazula opened this issue Mar 4, 2024 · 4 comments · Fixed by #295
Assignees
Labels

Comments

@hvgazula
Copy link
Contributor

hvgazula commented Mar 4, 2024

The get_steps_per_epoch method of the nobrainer.dataset.Dataset class assumes blocks are created from the input images. This method needs refactoring to handle the block_shape = None case.

def get_steps_per_epoch(self):
def get_n(a, k):
return (a - k) / k + 1
n_blocks = tuple(
get_n(aa, kk) for aa, kk in zip(self.volume_shape, self.block_shape)
)
for n in n_blocks:
if not n.is_integer() or n < 1:
raise ValueError(
"cannot create non-overlapping blocks with the given parameters."
)
n_blocks_per_volume = np.prod(n_blocks).astype(int)
steps = n_blocks_per_volume * self.n_volumes / self.batch_size
steps = math.ceil(steps)
return steps

@hvgazula hvgazula added the bug label Mar 4, 2024
@hvgazula hvgazula self-assigned this Mar 4, 2024
@hvgazula
Copy link
Contributor Author

hvgazula commented Mar 5, 2024

won't fail. all good.

@property
def block_shape(self):
return tuple(self.dataset.element_spec[0].shape[1:4].as_list())

@hvgazula hvgazula closed this as completed Mar 5, 2024
@hvgazula hvgazula reopened this Mar 6, 2024
@hvgazula
Copy link
Contributor Author

hvgazula commented Mar 6, 2024

fails if n_volumes is not set which is the case when the from_tfrecords function is called.

@hvgazula
Copy link
Contributor Author

hvgazula commented Mar 6, 2024

this line

block_length = len([0 for _ in first_shard])
is a huge bottleneck in the code. For context, please refer to https://stackoverflow.com/questions/70992022/how-to-get-the-correct-cardinality-of-a-tensorflow-dataset-after-filtering

@hvgazula
Copy link
Contributor Author

this is resolved by calculating the number of files within each shard. This case doesn't account for the situation where the last shard may have fewer files but that shouldn't impact the training process in any way.

hvgazula added a commit that referenced this issue Mar 12, 2024
This was referenced Mar 12, 2024
Closed
Merged
@hvgazula hvgazula linked a pull request Mar 20, 2024 that will close this issue
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant