Add support for 4D custom attention masks in GPT-2 #35517

sambhavnoobcoder · 2025-01-05T22:22:28Z

Problem Statement

Currently, GPT-2's attention mechanism only supports 2D attention masks, limiting its flexibility for advanced use cases like packed sequence processing. When users attempt to use 4D attention masks (shape [batch_size, num_heads, seq_length, seq_length]), the model raises dimension mismatch errors.

Issue #35290 demonstrates this limitation when trying to process packed sequences with custom attention patterns.

Proposed Solution

Extend GPT-2's attention mechanism to properly handle both 2D and 4D attention masks while maintaining backward compatibility. This allows for:

Direct support for packed sequence processing
More flexible attention patterns
Compatibility with existing 2D mask implementations

Implementation Details

The changes focus on the GPT2Attention class, specifically:

Updated attention mask handling in the forward pass
Maintained compatibility with existing 2D attention masks
Preserved causal attention behavior when required

Testing Strategy

Added comprehensive test suite (test_modeling_4D_attention_mask.py) that verifies:

Shape compatibility with 4D masks
Correctness of attention patterns
Consistency between 2D and 4D mask results
Causal attention preservation
Batch processing consistency
Edge cases (empty sequences, single tokens, maximum length)

Test Results

All tests passed successfully. Screenshot of test results:

Impact and Benefits

This enhancement:

Enables efficient packed sequence processing
Provides more flexibility in attention pattern design
Maintains backward compatibility
Improves model versatility without performance overhead

Validation

✅ New test suite validates 4D mask functionality
✅ Backward compatible with existing 2D masks
✅ No performance regression

Related Issues

Closes #35290 - Support for 4D attention masks in GPT-2

Additional Notes

No breaking changes introduced
Existing model weights remain compatible
Performance impact is negligible

requested reviewers - @ArthurZucker

… passing

sambhavnoobcoder added 13 commits January 6, 2025 03:08

changes to modeling to accomodate custom 4D masks

22c3a58

initilaise test file from custom GPT library import

ac6636d

setup basic class for testing attention mask for gpt2

32d9db2

add method for preparing data for testing

b2f0f80

add test for testing attention mask shapes

cbc2344

add test for testing attention mask shapes

b260df7

add test for testing casual attention

43bdb10

add test for testing batch consistency

c9bbc17

add test for testing edge cases

370ff49

add test for testing 4D mask handling

bdfd4f6

add test for testing 2D vs 4D behaviour to ensure consistency

08dc3f0

add helper function to prepare packed sequence for other functions in…

3c4e778

… passing

add main call

0dd63f7

sambhavnoobcoder changed the title ~~Add support for 4D attention masks in GPT-2~~ Add support for 4D custom attention masks in GPT-2 Jan 5, 2025

Merge branch 'main' into 4D-Tensor-Mismatch

23e203c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 4D custom attention masks in GPT-2 #35517

Add support for 4D custom attention masks in GPT-2 #35517

sambhavnoobcoder commented Jan 5, 2025

Add support for 4D custom attention masks in GPT-2 #35517

Are you sure you want to change the base?

Add support for 4D custom attention masks in GPT-2 #35517

Conversation

sambhavnoobcoder commented Jan 5, 2025

Problem Statement

Proposed Solution

Implementation Details

Testing Strategy

Test Results

Impact and Benefits

Validation

Related Issues

Additional Notes