-
Notifications
You must be signed in to change notification settings - Fork 24
Got stuck in getBatch with larger batch size #21
Comments
Hi, I am not sure what you are trying to accomplish with this sample code? Can you provide a high level explanation of what you want to use Dataset for? Thanks, |
In my actual application, I'm usually trying to do something like this: getBatch, numBatches, reset = dataset.sampledBatcher({
batchSize = opt.batchSize,
inputDims = {10, 256},
verbose = true,
poolSize = 4,
get = function(x)
return torch.load(x) -- or some other loading function like image.load / npy4th.load
end,
processor = function(res, processorOpt, input)
local x = augment(res) -- some data augmentation function
input:copy(x)
return true
end,
}) which I use a custom get function to load the data, and do some data augmentation in processor. This issue happens to me in different similar scenario where larger batch size got stuck. Thanks for your help. |
Try not setting the poolSize option, that's a tricky one to set. |
Yes, I find that not setting poolSize removes this error, but sometimes when I run for longer time the process got killed (just printed "Killed" in stderr), and I haven't figured out why yet. I suspect it is because of creating too many threads. Should the poolSize limited by the number of cores on the machine? Are there any guideline for how to set it? |
Its not really meant for users to set. I should probably remove. The threads are created once at the start and no more are created after it. So it doesn't make sense that it crashed due to too many threads. They way you are using Dataset, putting torch.load in custom get function, will create a ton of garbage and definitely won't be speedy. How is your data laid out? Is it a whole bunch of little files on disk? If you describe your data I can help you use Dataset to sample it efficiently. |
I'm processing video data, which are saved in a hard drive mounted in the system. Usually I save them in two formats:
Thanks a lot for your help! |
Hi, You can now adjust poolSize as much as you want. The deadlock has been fixed in the IPC ( https://github.com/twitter/torch-ipc ) package. Just get the latest version of it and you should be good to go. Thanks, |
The following code reproduce the error:
Strangely the program works when batchSize is 64, but got stuck in getBatch() when batchSize is 128.
I came across this problem for several different problems which use custom get function and load data with other non-default method like image.load, where batchSize 64 works but not 128.
Any idea is appreciated. Thanks!
The text was updated successfully, but these errors were encountered: