[REQ] add GigaGAN #111

MarcoRavich · 2024-10-01T06:20:31Z

Hi there, thanks for your work !

It would be great to have GigaGan upscaling too: https://mingukkang.github.io/GigaGAN/

Some - maybe useful - implementations:

Hope that inspires !

note: unfortunally VideoGigaGAN - by Adobe Research - sources are not (yet ?) availabe...

WolframRhodium · 2024-10-01T06:40:41Z

Hi!

One problem with these existing alternative implementations is that they may perform worse than the original. In fact, it seems that none of these provides pre-trained weights.

The second problem is that the GigaGAN model uses text conditioning, which is hard to apply to video in general I think.

NineMeowICT · 2024-10-02T03:47:00Z

@WolframRhodium
An available weight by FAL: https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
When it comes to super resolution, it doesn't require text conditioning. You can refer to this: https://github.com/GreenLandisaLie/AuraSR-ComfyUI

WolframRhodium · 2024-10-02T05:36:39Z

Thanks for the information.

The model is interesting because I need to improve existing vs-mlrt infrastructures to support it. This will take time.

zelenooki87 · 2025-02-15T19:43:16Z

@WolframRhodium
Hello. With a little difficulty, I have successfully converted the AuraSR v2 model (an open-source alternative to GigaGAN). Version v2 is quite decent, unlike the first version which produced many artifacts. The fp32 model would have exceeded 2GB in size and wouldn't fit into a single file. So, I created an fp16 version.

If it's not a problem, could you please take a look at the model structure and the conversion code? For example, the fp16 AuraSR v2 model works perfectly fine in Chainner, exactly as it should. However, in Hybrid with vsmlrt, I get a green image when using DirectML. ONNX mode and TensorRT don't work at all.

I'm sending you the ONNX model, the conversion code, and the error log from building the trtexec model.
https://mega.nz/folder/8kA2mKzZ#LWahzxk-447JaRSx3Xp_OQ

If we could somehow manage to adapt it for vs-mlrt, that would be excellent!

`[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/final_res_block/block2/act/Mul_output_0
[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/ups.4.0.5/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_res_block/Add [Add] inputs: [/upsampler/final_res_block/block2/act/Mul_output_0 -> (-1, 64, 256, 256)[HALF]], [/upsampler/ups.4.0.5/Add_output_0 -> (-1, 64, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Registering layer: /upsampler/final_res_block/Add for ONNX node: /upsampler/final_res_block/Add
[02/15/2025-20:41:00] [V] [TRT] Registering tensor: /upsampler/final_res_block/Add_output_0 for ONNX tensor: /upsampler/final_res_block/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_res_block/Add [Add] outputs: [/upsampler/final_res_block/Add_output_0 -> (-1, 64, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Static check for parsing node: /upsampler/final_to_rgb/Conv [Conv]
[02/15/2025-20:41:00] [V] [TRT] Parsing node: /upsampler/final_to_rgb/Conv [Conv]
[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/final_res_block/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] Searching for input: upsampler.final_to_rgb.weight
[02/15/2025-20:41:00] [V] [TRT] Searching for input: upsampler.final_to_rgb.bias
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_to_rgb/Conv [Conv] inputs: [/upsampler/final_res_block/Add_output_0 -> (-1, 64, 256, 256)[HALF]], [upsampler.final_to_rgb.weight -> (3, 64, 1, 1)[HALF]], [upsampler.final_to_rgb.bias -> (3)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Convolution input dimensions: (-1, 64, 256, 256)
[02/15/2025-20:41:00] [V] [TRT] Registering layer: /upsampler/final_to_rgb/Conv for ONNX node: /upsampler/final_to_rgb/Conv
[02/15/2025-20:41:00] [V] [TRT] Using kernel: (1, 1), strides: (1, 1), prepadding: (0, 0), postpadding: (0, 0), dilations: (1, 1), numOutputs: 3, nbGroups: 1
[02/15/2025-20:41:00] [V] [TRT] Convolution output dimensions: (-1, 3, 256, 256)
[02/15/2025-20:41:00] [V] [TRT] Registering tensor: rgb_5712 for ONNX tensor: rgb
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_to_rgb/Conv [Conv] outputs: [rgb -> (-1, 3, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Marking rgb_5712 as output: rgb
[02/15/2025-20:41:00] [I] Finished parsing network model. Parse time: 2.15281
[02/15/2025-20:41:00] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x240x320 OPT=1x3x240x320 MAX=1x3x240x320
[02/15/2025-20:41:00] [V] [TRT] Trying to set exclusive file lock C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache.lock

[02/15/2025-20:41:00] [W] [TRT] Could not read timing cache from: C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache. A new timing cache will be generated and written.
[02/15/2025-20:41:00] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (Dimension mismatch for tensor input and profile 0. At dimension axis 2, profile has min=240, opt=240, max=240 but tensor has 64.)
[02/15/2025-20:41:00] [E] Engine could not be created from network
[02/15/2025-20:41:00] [E] Building engine failed
[02/15/2025-20:41:00] [E] Failed to create engine from model or file.
[02/15/2025-20:41:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100800] [b43] # C:/Program Files/Hybrid/64bit/vs-mlrt\vsmlrt-cuda\trtexec --onnx=C:/Users/admin/aura-sr/AURASRv2-onnx/aura_sr_v2_fp16_single30.onnx --timingCacheFile=C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache --device=0 --saveEngine=C:/Users/admin/AppData/Local/Temp\cbc61e.engine --memPoolSize=workspace:1073741824 --shapes=input:1x3x240x320 --fp16 --verbose --tacticSources=-CUBLAS,-CUBLAS_LT,-CUDNN,+EDGE_MASK_CONVOLUTIONS,+JIT_CONVOLUTIONS --useCudaGraph --noDataTransfers --noTF32 --inputIOFormats=fp16:chw --outputIOFormats=fp32:chw --builderOptimizationLevel=3
`

WolframRhodium · 2025-02-16T00:03:23Z

Thanks for your information!

In L112 of your script konverzija.py, you need to use

dummy_lowres = torch.randn(1, 3, 240, 320, device=device, dtype=torch.float16)

so that it matches --shapes=input:1x3x240x320 in trtexec's command.

(I guess the network requires mod64 input, so (1, 3, 240, 320) might not work and (1, 3, 256, 320) is a candidate)

zelenooki87 · 2025-02-16T09:55:59Z

I somehow managed to convert it to support different inputs, that is, to have dynamic axes. Interestingly, it now works in Chainner with artifacts, while in Hybrid, vsmlrt works correctly in TensorRT mode. It takes a bit longer to generate the model since it's quite large, but the important thing is that it works correctly. The performance isn't great. On a 640x480 input with 24GB of VRAM (RTX 3090), it gets around 2 FPS. You can take a look, and if any further finesse is possible for optimization, feel free to modify it. Then, perhaps, you could add it to the vsmlrt project. Although the model is primarily intended for photos without artifacts and noise.

I just wanted to point out that the original author's aura-sr project, when you run it, downloads and uses the v1 model, which is much worse. If you want to convert it locally, replace the v1 model weights with the v2 weights from their Hugging Face page in the conda cache directory.

https://mega.nz/folder/k1pARK5J#PP-R6FbnZi6JDStK1BXCHw

WolframRhodium · 2025-02-16T10:35:01Z

Thanks for the information. (It is actually faster than I expected.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQ] add GigaGAN #111

[REQ] add GigaGAN #111

MarcoRavich commented Oct 1, 2024

WolframRhodium commented Oct 1, 2024

NineMeowICT commented Oct 2, 2024 •

edited

Loading

WolframRhodium commented Oct 2, 2024

zelenooki87 commented Feb 15, 2025

WolframRhodium commented Feb 16, 2025

zelenooki87 commented Feb 16, 2025

WolframRhodium commented Feb 16, 2025

[REQ] add GigaGAN #111

[REQ] add GigaGAN #111

Comments

MarcoRavich commented Oct 1, 2024

WolframRhodium commented Oct 1, 2024

NineMeowICT commented Oct 2, 2024 • edited Loading

WolframRhodium commented Oct 2, 2024

zelenooki87 commented Feb 15, 2025

WolframRhodium commented Feb 16, 2025

zelenooki87 commented Feb 16, 2025

WolframRhodium commented Feb 16, 2025

NineMeowICT commented Oct 2, 2024 •

edited

Loading