Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviation in outputs of nodes between TFLM and Tflite (Python) #3046

Open
ShardulNalegave opened this issue Feb 3, 2025 · 2 comments
Open

Comments

@ShardulNalegave
Copy link

Note:- This issue is a continuation of a previously filed issue (#3039) after more research and identifying more problems.

I am trying to run a int8 quantized MobileNetV2 model on a ESP32S3-Eye but am running into multiple issues.
I will use this thread to explain the deviation of outputs issue, the previously linked thread explains issues with allocation of Softmax node and incorrect value of kBeta

As the model kept giving incorrect results I decided to take a look at output values for every node of my model and I found a very interesting observation. At node no. 11 (a DepthwiseConv2D node) I saw that a few values were different between my Python and TFLM run.

Now, I'm only printing the first 100 values of the output and thus it cannot be said that the deviation started from this point onwards, it might very well be possible that it started before but was visible in the first 100 values from this node.

Proof:-

  1. TFLM
Image Image

At the top we have the values fed into the input tensor of interpreter (only first 100), at bottom we have the first 100 values of the output of node 11.

  1. Python
Image

Similarly, top is input, bottom is output of node 11.
Here I have also printed what final predictions I'm getting which are right.

These differences might look very small but they add up quickly. At node 62 (Mean) I have the following outputs:-
Image

Left: TFLM
Right: Python

Notice the increase in the number of deviations and also the widening gap.

Additional Info

While yes I'm using esp-tflite-micro which replaces TFLM kernels with its own which uses esp-nn drastically improving performance, the issue still persists after disabling all optimizations and custom kernels.

I have uploaded my code (esp-idf and python) along with the test image and model on GitHub.
Check it here: https://github.com/ShardulNalegave/esp-mbnetv2-test

Note:- The linked GitHub repo includes the kBeta override fix as mentioned in #3039

ShardulNalegave added a commit to ShardulNalegave/esp-mbnetv2-test that referenced this issue Feb 3, 2025
@ShardulNalegave
Copy link
Author

The deviation is significant enough that the FullyConnected node after Mean returns an output with all values equal to -60. This is then passed to Softmax and I get all confidence scores equal (0.996094) no matter what input I give.

I have given it the correct input and randomly generated numbers too, both result in the same output.

To verify that the deviation is what causes this issue, I extracted what the inputs for FullyConnected node should be given my test image using the Python code, dumped it into a .bin file and converted it into a C-array using xxd -i

I then used this C-array to manually override the inputs to FullyConnected node in the FullyConnectedEval method in its respective kernel file and then ran my program to check the outputs from this node onwards.

This actually gave me the correct final output!

@ShardulNalegave
Copy link
Author

Node 11 was indeed not the first node with deviation.
The deviation exists from Node 1 itself.

Attaching results from my analysis.

node0.results.csv
node11.results.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant