-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop in mAP after TensorRT optimization #315
Comments
Here are all fixes I made so far: |
That sounds good.
I have read through your commit history. I think my current code does not have those issues you've fixed in your own code... I did reference the original AlexyAB/darknet code to develop my implementation. For example, "scale_x_y", which is used in yolov4/yolov4-tiny models, would affect how center x/y coordinates of bboxes are calculated. And I implemented that calculation in the "yolo_layer" plugin. tensorrt_demos/plugins/yolo_layer.cu Lines 238 to 239 in 793d7ae
|
I will prob. have time this weekend to crosscheck implementations. I will get back at you when I have more info. |
@philipp-schmidt Look forward to your updates. Meanwhile, I'm inclined to think the problem lies more likely in darknet -> onnx -> TensorRT conversion. I will also review the code when I have time. |
Hi, a main source of wrong results and bad accuracy has been fixed for me in triton inference server. It was a server side race condition... I was hunting ghosts for many weeks... triton-inference-server/server#2339 Now I can focus on mAP, I'll keep you posted. |
NVIDIA has this Polygraphy tool which could be used to compare "layer-wise" outputs between the ONNX model and the TensorRT engine. I think that would be an effective way to debug this mAP dropping problem. Here is an example Polygraphy debugging output: NVIDIA/TensorRT#1087 (comment) I'm not sure when I'll have time to look into this, though. |
I couldn't yet make the time to fully tackle this as well unfortunately. |
NVIDIA's Polygraphy tool turns out to be very easy to use. I just follow the installation instructions and use the following command to debug the models.
I summarize the results below. All comparisons are done between TensorRT FP16 and ONNX Runtime.
|
I am guessing this is where there is a loss of accuracy? Will there be a fix? |
Interesting results, thanks for checking it out jkjung. I'm curious if
there are any guarantees from TensorRT regarding precision.
And taking into account that TensorRT selects from a range of different
implementations for each layer the next question is: will this accuracy
drop be reproducible and consistent among different hardwares?
|
I re-ran Polygraphy by specifying the correct input data range for the yolo models ("--float-min 0.0 --float-max 1.0"), e.g.
Here are the results: (FP16)
The TensorRT "yolov3-tiny" FP16 engine is the only one which generates an output with >5% mean relative error from onnxruntime (all others are <1%). I think this indeed explains why the TensorRT "yolov3-tiny" engine evaluates to a much worse mAP than its DarkNet counterpart, comparing to the other models ("yolov3-608", "yolov4-tiny-416" and "yolov4-608")... |
Hello, sorry for not adding anything to the discussion but i wanted to check, i'm currently trying to implement this repository on a Jetson Nano. Does the yolov4-tiny model also present the mAP drop that has been discussed mainly for yolov3? Anyways if this is unclear i will conduct my own tests on a custom dataset and can report the results back to you. |
Based on my mAP evaluation results, "yolov3-tiny" suffers from this problem quite a bit. The other models ("yolov3", "yolov4-tiny" and "yolov4") are probably OK. I would focus on solving the problem for "yolov3-tiny" if I have time. |
@jkjung-avt same problem for yolov4-mish ,yolov4-csp-swish model also, im getting lots of False positive & results are not same as darknet, May i know what are the reasons behind it? & how can we solve the FP problem? |
@akashAD98 This is a known issue. I've done my best to make sure the code is correct for both TensorRT engine building and inferencing. But TensorRT engine optimization does result in mAP drop for various YOLO models. I have also tried to analyze this problem with polygraphy as shown above, but failed to find the root cause and a solution. I don't have a good answer now. That's why I kept this issue open... |
@jkjung-avt thanks for your kind reply. we all appreciate your great work. Hope you will get a solution in the future. |
@jkjung-avt can we do inference & check the FPS & False prediction of onnx model? what you think about accuracy (False prediction ) its the same as tensorrt? |
I have done that for MODNet, but not for YOLO models. Some of the code could be reused though: https://github.com/jkjung-avt/tensorrt_demos/blob/master/modnet/test_onnx.py In order to check mAP and false detection with the ONNX YOLO models, you'll also have to implement "yolo" layers in the post-processing code (this part is handled by the "yolo_layer" plugin in TensorRT cases). I don't think I have time to do that in the near future... |
hi @jkjung-avt do you have any idea ? how should i solve this issue onnx/tutorials#253 (comment) this is script inference_onnx_yolov4-mish.ipynb.txt |
@jkjung-avt please have look |
@akashAD98 I already commented: onnx/tutorials#253 (comment) You need to modify the postprocessing code by yourself. |
@jkjung-avt is there any model which has almost similar results like darknet? yolov4-csp,yolo-mish has issues of false prediction ? so im looking for a good model of tensorrt. yolov4 is best?? |
Please refer to the "mAP and FPS" table in Demo #5: YOLOv4. |
@jkjung-avt one of the observations from my experiments- i have done the same experiments with few more category classes & after that experiments, i come to know that its giving less FP if you have more classes & high FP if classes are less. this is just my experimental observation-if you think this can help us to solve this issue, please let us know. Thanks |
@akashAD98 Thanks for sharing the info. I tried to think about possible causes of such results but could not come up with any. I will keep this in mind and share my experience/thoughts when I have new findings. |
Hello @jkjung-avt, For my use case, I am trying to detect with yolov3 only one type of object (only one class). After comparison with the code of yolov3 (https://github.com/experiencor/keras-yolo3), I observe that there is a major difference in the output classes probabilities processing. In the original code (https://github.com/experiencor/keras-yolo3/blob/master/utils/utils.py at line 179), they apply softmax to all class probabilities: In your code, (https://github.com/jkjung-avt/tensorrt_demos/blob/master/plugins/yolo_layer.cu at line 167), you post-process the class probabilities with a sigmoid: In my case with only one classe:
I think that this problem can explain why with more classes, the mAP is better (because the softmax is more similar to the sigmoid). Thank you for your contribution with tensorrt_demos, |
@ThomasGoud Thanks for sharing your thoughts. But according to the original DarkNet implementation, the objectness and class scores are calculated by taking LOGISTIC (i.e. sigmoid) activation on the outputs of the previous convolutional layers. You could refer to the source code as pointed below. |
@jkjung-avt
Hi, could we work together on the problem of the reduced accuracy? I believe I have similar issues in my implementation and I do not use any onnx conversion whatsoever. I would like to get this fixed and could use additional examples where it goes wrong to determine what's the cause.
We could start to work on the postprocessing method. I started with existing code for the yolo layer plugin similar to yours and had to fix a few errors already. Please let me know if my code increases your precision:
https://github.com/isarsoft/yolov4-triton-tensorrt/blob/master/clients/python/processing.py
The text was updated successfully, but these errors were encountered: