diff --git a/docs/source/autoscaling.mdx b/docs/source/autoscaling.mdx index dce38e6..23fd841 100644 --- a/docs/source/autoscaling.mdx +++ b/docs/source/autoscaling.mdx @@ -10,7 +10,7 @@ The autoscaling process is triggered based on the accelerator's utilization metr - **GPU Accelerators**: A new replica is added when the average GPU utilization of all replicas over a 2-minute window reaches 80%. -It's important to note that the scaling up process takes place every 3 minutes, while the scaling down process takes 5 minutes. This frequency ensures a balance between responsiveness and stability of the autoscaling system. +It's important to note that the scaling up process takes place every minute, while the scaling down process takes 2 minutes. This frequency ensures a balance between responsiveness and stability of the autoscaling system, with a stabilization of 300 seconds once scaled up or down. ## Considerations for Effective Autoscaling