Skip to content
This repository has been archived by the owner on May 29, 2023. It is now read-only.

Latest commit

 

History

History
8 lines (4 loc) · 575 Bytes

File metadata and controls

8 lines (4 loc) · 575 Bytes

Optimizers

Optimizers manage weight update starting from gradient values. They may have complex internal states to better move on the loss multi-dimensional surface. Please use the fixed signature __init__(hyperparameters: Namespace, named_parameters: Generator) -> None for all the subclasses.

ElectraAdamW

This optimizer is same as AdamW but for a small fix to the moving average update mechanism. Original implementation can be found here.