Replies: 1 comment
-
Not exactly what you are saying, but there is something somewhat similar: lookup decoding #6828 (using ngram and a corpus of text as a reference) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
[Originally posted here: https://github.com/turboderp-org/exllamav2/issues/737#issuecomment-2676995529 but it seems that was the wrong level for the idea]
I had an idea that it should be possible to get some benefits of speculative decoding basically for free or better. My use case is specifically for code but it should be useful beyond that. In particular, when asking a model to change some code, a lot of time is typically spent echoing unchanged code back at you. It should be a common occurrence any time you ask a model to alter something you gave it. The output can have dozens or even hundreds of tokens in an exact sequence match from prior context before the model finds the place it actually wants to make changes. In cases like this, I thought it might be possible to mimic speculative decode functionality by just algorithmically feeding it the next tokens from a recognized sequence, and this might allow for a huge speed up because we can realistically predict a dozen or more tokens into the future with a reasonable chance that they will all be correct choices. This would be almost free or maybe even improve energy efficiency because you don't need to do any inference to get the draft predictions.
If this idea were developed, it would be nice if it were customizable by just feeding it a scriptable function which is given the token sequence and is able to return speculative tokens or choose not to predict any tokens. This would make it easier to develop more advanced algorithmic speculative decoders.
It would also be nice if the scriptable function could force tokens to be chosen, so you can kind of kill two birds with one stone and offer both free draft tokens and formatting rules in one place.
Beta Was this translation helpful? Give feedback.
All reactions