-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with reproducing "strided" attention scheme from the paper #7
Comments
After reading this code -- "attention.py", I find this base code only contains separate implementations of strided attention, called "first / second step of strided attention" within it. Therefore, you perhaps need to implement a integral version of strided attention by yourself with each head corresponding to one of aforementioned two steps for a two head sparse self-attention. |
@krishnadubba Have you successfully implement the strided version btw? Could you share the code change? |
I was able to reproduce the patterns using this function. ` def sparse_attention_mask(n_tokens, stride_length=3, c=2):
|
HI,
I am trying to visualize the attention schemes using this code. Basically trying to reproduce Fig:3 from the paper. I could reproduce the "fixed" attention scheme as shown below:
The problem is I could not reproduce the "strided" scheme (Fig 3.b from paper). All I get is the following no matter what parameters I try:
If I change some code then I can get the correct "strided" version as shown in the paper. The following is after some code changes:
Did anyone face the same issue?
The text was updated successfully, but these errors were encountered: