-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No speed advantage when using batches. #58
Comments
Apparently CRAFT can run in batches, here
clovaai/CRAFT-pytorch#44 (comment) and in other comments in the issue section of CRAFT's GitHub, it is stated that batch prediction is feasible. |
Also, I think it would make more sense to decouple the batchsize used by parseq for the text recognition and the tamil-ocr batch size parameter. these should be two separate numbers. |
Hi @Dario-Mantegazza , thanks for your feedback. I will try to include batch mode for CRAFT text detection in coming weeks. |
Hi again @gnana70, in the meantime I will make a fork and see if I can implement a temporary workaround. I will keep you posted. |
Hi @Dario-Mantegazza , thanks for your help. Please share your workaround once done. |
So I tried to change the code in the most simple and hacky way, but for now, I don't get better performances; I think that something is broken in my edited version and while all the model accepts batched input, something else curb the performance gain. I will upload my version that works partially on my fork but due to work deadlines I don't think I can spend more time on this. |
@Dario-Mantegazza , no problem. I will investigate and fix it up |
Most of the time in processing appears to be the cv2/numpy code for extracting the detected word images from the main image. I swapped this code out for a simple min/max rectangle and saw time for a page I was testing on a file that went from 360s to under 15s. For images with larger numbers of bounding boxes, this will be an even more drastic speedup, since it reduces this from 1-2 seconds per bounding box to around 1/100000 of a second per bounding box. the only downside is that this isn't straightening the text- it just pulls out a bounding box. This works for my use case though since I am extracting from documents without any tilted text. Here are the timings before and after for the portion of the code I was in Before
After
Code is at https://github.com/JamesDConley/faster_tamil_ocr |
I did some tests when using both detection+recognition with a set of 30 images and I've seen that there is no speed improvements when using batches.
So I checked the code and if I got it right in your implementation,
tamil_ocr/ocr_tamil/ocr.py
Lines 527 to 536 in 71a91db
I'm not an expert in Parseq, but if it already can deal with batches of BB why not simply take all the BB from the all batch and pass those as a single input to parseq?
To recap my suggestion why don't you do something like the following:
This should be faster as you call parseq only once per batch and not per image, albeit with a larger memory cost but that can be dealt by the batches size parameter.
Obviously even better would be to do something like:
The text was updated successfully, but these errors were encountered: