Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for selective post training quantization for 8 bit weights and 16 bit activations #395

Open
gaikwadrahul8 opened this issue Nov 28, 2024 · 2 comments
Assignees

Comments

@gaikwadrahul8
Copy link

System information

TensorFlow version (you are using): TF 2.13.0
Are you willing to contribute it (Yes/No): No
Describe the feature and the current behavior/state.

Dear TF developers, I'm currently experimenting with PTQ using 8 bit weights and 16 bit activations (W8A16), and I've gotten great results. However, after some experimentation I have identified that only a certain part of my network requires the 16 bit activations. In other word, using 16 bit activations for the entire model is sub-optimal for my use-case.

Hence, I'm looking for a way to selectively quantize a part of my model to 8 bit weights and activations (W8A8), and the other part to W8A16.

In the current state, would this be possible somehow ?

Who will benefit with this feature?
Platforms that support mixed-precision execution of activations.

Any Other info.

@gaikwadrahul8
Copy link
Author

This issue originally reported by @Hrayo712 has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.

@pkgoogle
Copy link
Contributor

pkgoogle commented Dec 2, 2024

Original Issue: tensorflow/tensorflow#61720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants