Restormer Implementation #8312

phisanti · 2025-01-23T14:51:24Z

Fixes # .

Description

This PR implements the Restormer architecture for high-resolution image restoration in MONAI following the discussion in issue #8261. The implementation supports both 2D and 3D images using MONAI's convolution as the base. Key additions include:

Downsample class for efficient downsampling operations
pixel_unshuffle operation complementing existing pixel_shuffle
Channel Attention Block (CABlock) with FeedForward layer
Multi-DConv Head Transposed Self-Attention (MDTA)
OverlapPatchEmbed class
Comprehensive unit tests for all new components

The implementation follows MONAI's coding patterns and includes performance validations against native PyTorch operations where applicable.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

…nsample class alias

…pass ./runtests.sh -f -u --net --coverage

for more information, see https://pre-commit.ci

ericspod

Looks good overall but I had a few inline comments, and we should have full docstrings everywhere appropriate. For any classes meant for general purpose use (ie. not just by Restormer) please ensure they have docstring descriptions for the arguments (at the very least for constructor args). Thanks!

ericspod · 2025-01-24T12:18:35Z

monai/networks/utils.py

+    See: Aitken et al., 2017, "Checkerboard artifact free sub-pixel convolution".
+
+    Args:
+        x: Input tensor


Here we should specifically state that x has shape BCHW[D].

ericspod · 2025-01-24T12:22:03Z

monai/networks/utils.py

+
+    if any(d % factor != 0 for d in input_size[2:]):
+        raise ValueError(
+            f"All spatial dimensions must be divisible by factor {factor}. " f"Got spatial dimensions: {input_size[2:]}"


Suggested change

f"All spatial dimensions must be divisible by factor {factor}. " f"Got spatial dimensions: {input_size[2:]}"

f"All spatial dimensions must be divisible by {factor}, spatial shape is: {input_size[2:]}"

Maybe a little shorter?

ericspod · 2025-01-24T12:36:58Z

monai/networks/blocks/downsample.py

+            kernel_size_ = ensure_tuple_rep(kernel_size, spatial_dims)
+            padding = tuple((k - 1) // 2 for k in kernel_size_)
+
+        if down_mode == "conv":


Suggested change

if down_mode == "conv":

if down_mode == DownsampleMode.CONV:

ericspod · 2025-01-24T12:37:20Z

monai/networks/blocks/downsample.py

+                    bias=bias,
+                ),
+            )
+        elif down_mode == "convgroup":


Suggested change

elif down_mode == "convgroup":

elif down_mode == DownsampleMode.CONVGROUP:

ericspod · 2025-01-24T12:58:07Z

monai/networks/blocks/downsample.py

+            if post_conv:
+                self.add_module("postconv", post_conv)
+
+        elif down_mode == "pixelunshuffle":


Suggested change

elif down_mode == "pixelunshuffle":

elif down_mode == DownsampleMode.PIXELSHUFFLE:

ericspod · 2025-01-24T13:09:24Z

monai/networks/blocks/cablock.py

+    """Multi-DConv Head Transposed Self-Attention (MDTA): Differs from standard self-attention
+    by operating on feature channels instead of spatial dimensions. Incorporates depth-wise
+    convolutions for local mixing before attention, achieving linear complexity vs quadratic
+    in vanilla attention. Based on SW Zamir, et al., 2022 <https://arxiv.org/abs/2111.09881>"""
+


We should have a full docstring here describing the arguments for the constructor, and in the previous class.

ericspod · 2025-01-24T13:25:53Z

monai/networks/nets/restormer.py

+class OverlapPatchEmbed(nn.Module):
+    """Initial feature extraction using overlapped convolutions.
+    Unlike standard patch embeddings that use non-overlapping patches,
+    this approach maintains spatial continuity through 3x3 convolutions."""
+
+    def __init__(self, spatial_dims: int, in_c: int = 3, embed_dim: int = 48, bias: bool = False):
+        super().__init__()
+        self.proj = Convolution(
+            spatial_dims=spatial_dims,
+            in_channels=in_c,
+            out_channels=embed_dim,
+            kernel_size=3,
+            strides=1,
+            padding=1,
+            bias=bias,
+            conv_only=True,
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.proj(x)


Suggested change

class OverlapPatchEmbed(nn.Module):

"""Initial feature extraction using overlapped convolutions.

Unlike standard patch embeddings that use non-overlapping patches,

this approach maintains spatial continuity through 3x3 convolutions."""

def __init__(self, spatial_dims: int, in_c: int = 3, embed_dim: int = 48, bias: bool = False):

super().__init__()

self.proj = Convolution(

spatial_dims=spatial_dims,

in_channels=in_c,

out_channels=embed_dim,

kernel_size=3,

strides=1,

padding=1,

bias=bias,

conv_only=True,

)

def forward(self, x: torch.Tensor) -> torch.Tensor:

return self.proj(x)

class OverlapPatchEmbed(Convolution):

"""

Initial feature extraction using overlapped convolutions. Unlike standard patch embeddings

that use non-overlapping patches, this approach maintains spatial continuity through 3x3 convolutions.

"""

def __init__(self, spatial_dims: int, in_c: int = 3, embed_dim: int = 48, bias: bool = False):

super().__init__(

spatial_dims=spatial_dims,

in_channels=in_c,

out_channels=embed_dim,

kernel_size=3,

strides=1,

padding=1,

bias=bias,

conv_only=True,

)

Would it work to inherit directly from Convolution?

phisanti and others added 24 commits January 15, 2025 09:18

Add new pixel unshuffle for SubPixelDownsample class

3db93ce

Add unit test for pixelunshuffle

9693e04

Add DownSample Modes

a89f299

expand pixelunshuffle for 3D

450691f

increase testing for pixelunshuffle

d0920d8

expand pixelunshuffle for 3D images

1a48d4d

add SubpixelDownsample and tests

fe47807

Add DownSample Class

86155cd

Add tests for Downsample

137a7f2

add exports to __init__

fb17baf

Include test to compare with Conv + unshuffle from original restormer

5ff0baa

remove relative imports

2566db1

Create restormer with Downsampler/Upsampler using monai implementation

ac4047b

Add channel attention block

2b74270

add assembled restormer with MONAI convs for 3D

9b74533

restormer adapted for 2D/3D

1ab34f6

Add unit test for CABlock and the FeedForward layers

4f4c62c

remove relative imports

068688f

rename restormer

e2e1070

add unit test restormer

35c7ee4

Update documentation and imports for CABlock and FeedForward; add Dow…

d8cb6c1

…nsample class alias

Add licence to pixel_unshuffle test

6d96816

Refactor imports and clean up whitespace in utils and test files and …

8a688fb

…pass ./runtests.sh -f -u --net --coverage

[pre-commit.ci] auto fixes from pre-commit.com hooks

acb818d

for more information, see https://pre-commit.ci

ericspod requested review from ericspod, Nic-Ma, KumoLiu, yiheng-wang-nv and Can-Zhao January 24, 2025 12:15

ericspod reviewed Jan 24, 2025

View reviewed changes

ericspod requested a review from aylward January 24, 2025 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restormer Implementation #8312

Restormer Implementation #8312

phisanti commented Jan 23, 2025

ericspod left a comment

ericspod Jan 24, 2025

ericspod Jan 24, 2025

ericspod Jan 24, 2025

ericspod Jan 24, 2025

ericspod Jan 24, 2025

ericspod Jan 24, 2025

ericspod Jan 24, 2025

	f"All spatial dimensions must be divisible by factor {factor}. " f"Got spatial dimensions: {input_size[2:]}"
	f"All spatial dimensions must be divisible by {factor}, spatial shape is: {input_size[2:]}"

	elif down_mode == "convgroup":
	elif down_mode == DownsampleMode.CONVGROUP:

	elif down_mode == "pixelunshuffle":
	elif down_mode == DownsampleMode.PIXELSHUFFLE:

Restormer Implementation #8312

Are you sure you want to change the base?

Restormer Implementation #8312

Conversation

phisanti commented Jan 23, 2025

Description

Types of changes

ericspod left a comment

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment

ericspod Jan 24, 2025

Choose a reason for hiding this comment