Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] add GigaGAN #111

Open
MarcoRavich opened this issue Oct 1, 2024 · 7 comments
Open

[REQ] add GigaGAN #111

MarcoRavich opened this issue Oct 1, 2024 · 7 comments

Comments

@MarcoRavich
Copy link

Hi there, thanks for your work !

It would be great to have GigaGan upscaling too: https://mingukkang.github.io/GigaGAN/

Some - maybe useful - implementations:

Hope that inspires !

note: unfortunally VideoGigaGAN - by Adobe Research - sources are not (yet ?) availabe...

@WolframRhodium
Copy link
Contributor

Hi!

One problem with these existing alternative implementations is that they may perform worse than the original. In fact, it seems that none of these provides pre-trained weights.

The second problem is that the GigaGAN model uses text conditioning, which is hard to apply to video in general I think.

@NineMeowICT
Copy link

NineMeowICT commented Oct 2, 2024

@WolframRhodium
An available weight by FAL: https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
When it comes to super resolution, it doesn't require text conditioning. You can refer to this: https://github.com/GreenLandisaLie/AuraSR-ComfyUI

@WolframRhodium
Copy link
Contributor

Thanks for the information.

The model is interesting because I need to improve existing vs-mlrt infrastructures to support it. This will take time.

@zelenooki87
Copy link

@WolframRhodium
Hello. With a little difficulty, I have successfully converted the AuraSR v2 model (an open-source alternative to GigaGAN). Version v2 is quite decent, unlike the first version which produced many artifacts. The fp32 model would have exceeded 2GB in size and wouldn't fit into a single file. So, I created an fp16 version.

If it's not a problem, could you please take a look at the model structure and the conversion code? For example, the fp16 AuraSR v2 model works perfectly fine in Chainner, exactly as it should. However, in Hybrid with vsmlrt, I get a green image when using DirectML. ONNX mode and TensorRT don't work at all.

I'm sending you the ONNX model, the conversion code, and the error log from building the trtexec model.
https://mega.nz/folder/8kA2mKzZ#LWahzxk-447JaRSx3Xp_OQ

If we could somehow manage to adapt it for vs-mlrt, that would be excellent!

`[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/final_res_block/block2/act/Mul_output_0
[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/ups.4.0.5/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_res_block/Add [Add] inputs: [/upsampler/final_res_block/block2/act/Mul_output_0 -> (-1, 64, 256, 256)[HALF]], [/upsampler/ups.4.0.5/Add_output_0 -> (-1, 64, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Registering layer: /upsampler/final_res_block/Add for ONNX node: /upsampler/final_res_block/Add
[02/15/2025-20:41:00] [V] [TRT] Registering tensor: /upsampler/final_res_block/Add_output_0 for ONNX tensor: /upsampler/final_res_block/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_res_block/Add [Add] outputs: [/upsampler/final_res_block/Add_output_0 -> (-1, 64, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Static check for parsing node: /upsampler/final_to_rgb/Conv [Conv]
[02/15/2025-20:41:00] [V] [TRT] Parsing node: /upsampler/final_to_rgb/Conv [Conv]
[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/final_res_block/Add_output_0
[02/15/2025-20:41:00] [V] [TRT] Searching for input: upsampler.final_to_rgb.weight
[02/15/2025-20:41:00] [V] [TRT] Searching for input: upsampler.final_to_rgb.bias
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_to_rgb/Conv [Conv] inputs: [/upsampler/final_res_block/Add_output_0 -> (-1, 64, 256, 256)[HALF]], [upsampler.final_to_rgb.weight -> (3, 64, 1, 1)[HALF]], [upsampler.final_to_rgb.bias -> (3)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Convolution input dimensions: (-1, 64, 256, 256)
[02/15/2025-20:41:00] [V] [TRT] Registering layer: /upsampler/final_to_rgb/Conv for ONNX node: /upsampler/final_to_rgb/Conv
[02/15/2025-20:41:00] [V] [TRT] Using kernel: (1, 1), strides: (1, 1), prepadding: (0, 0), postpadding: (0, 0), dilations: (1, 1), numOutputs: 3, nbGroups: 1
[02/15/2025-20:41:00] [V] [TRT] Convolution output dimensions: (-1, 3, 256, 256)
[02/15/2025-20:41:00] [V] [TRT] Registering tensor: rgb_5712 for ONNX tensor: rgb
[02/15/2025-20:41:00] [V] [TRT] /upsampler/final_to_rgb/Conv [Conv] outputs: [rgb -> (-1, 3, 256, 256)[HALF]],
[02/15/2025-20:41:00] [V] [TRT] Marking rgb_5712 as output: rgb
[02/15/2025-20:41:00] [I] Finished parsing network model. Parse time: 2.15281
[02/15/2025-20:41:00] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x240x320 OPT=1x3x240x320 MAX=1x3x240x320
[02/15/2025-20:41:00] [V] [TRT] Trying to set exclusive file lock C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache.lock

[02/15/2025-20:41:00] [W] [TRT] Could not read timing cache from: C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache. A new timing cache will be generated and written.
[02/15/2025-20:41:00] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (Dimension mismatch for tensor input and profile 0. At dimension axis 2, profile has min=240, opt=240, max=240 but tensor has 64.)
[02/15/2025-20:41:00] [E] Engine could not be created from network
[02/15/2025-20:41:00] [E] Building engine failed
[02/15/2025-20:41:00] [E] Failed to create engine from model or file.
[02/15/2025-20:41:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100800] [b43] # C:/Program Files/Hybrid/64bit/vs-mlrt\vsmlrt-cuda\trtexec --onnx=C:/Users/admin/aura-sr/AURASRv2-onnx/aura_sr_v2_fp16_single30.onnx --timingCacheFile=C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache --device=0 --saveEngine=C:/Users/admin/AppData/Local/Temp\cbc61e.engine --memPoolSize=workspace:1073741824 --shapes=input:1x3x240x320 --fp16 --verbose --tacticSources=-CUBLAS,-CUBLAS_LT,-CUDNN,+EDGE_MASK_CONVOLUTIONS,+JIT_CONVOLUTIONS --useCudaGraph --noDataTransfers --noTF32 --inputIOFormats=fp16:chw --outputIOFormats=fp32:chw --builderOptimizationLevel=3
`

@WolframRhodium
Copy link
Contributor

Thanks for your information!

In L112 of your script konverzija.py, you need to use

dummy_lowres = torch.randn(1, 3, 240, 320, device=device, dtype=torch.float16)

so that it matches --shapes=input:1x3x240x320 in trtexec's command.


(I guess the network requires mod64 input, so (1, 3, 240, 320) might not work and (1, 3, 256, 320) is a candidate)

@zelenooki87
Copy link

I somehow managed to convert it to support different inputs, that is, to have dynamic axes. Interestingly, it now works in Chainner with artifacts, while in Hybrid, vsmlrt works correctly in TensorRT mode. It takes a bit longer to generate the model since it's quite large, but the important thing is that it works correctly. The performance isn't great. On a 640x480 input with 24GB of VRAM (RTX 3090), it gets around 2 FPS. You can take a look, and if any further finesse is possible for optimization, feel free to modify it. Then, perhaps, you could add it to the vsmlrt project. Although the model is primarily intended for photos without artifacts and noise.

I just wanted to point out that the original author's aura-sr project, when you run it, downloads and uses the v1 model, which is much worse. If you want to convert it locally, replace the v1 model weights with the v2 weights from their Hugging Face page in the conda cache directory.

https://mega.nz/folder/k1pARK5J#PP-R6FbnZi6JDStK1BXCHw

@WolframRhodium
Copy link
Contributor

Thanks for the information. (It is actually faster than I expected.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants