Configuring the Autoscaler
Since Knative v0.2, per revision autoscalers have been replaced by a single shared autoscaler. This is, by default, the Knative Pod Autoscaler (KPA), which provides fast, request-based autoscaling capabilities out of the box.
Configuring Knative Pod Autoscaler
To modify the Knative Pod Autoscaler (KPA) configuration, you must modify a
Kubernetes ConfigMap called
config-autoscaler in the
You can view the default contents of this ConfigMap using the following command.
kubectl -n knative-serving get cm config-autoscaler
Example of default ConfigMap
apiVersion: v1 kind: ConfigMap metadata: name: config-autoscaler namespace: knative-serving data: container-concurrency-target-default: 100 container-concurrency-target-percentage: 1.0 enable-scale-to-zero: true enable-vertical-pod-autoscaling: false max-scale-up-rate: 10 panic-window: 6s scale-to-zero-grace-period: 30s stable-window: 60s tick-interval: 2s
Configuring scale to zero for KPA
To correctly configure autoscaling to zero for revisions, you must modify the following parameters in the ConfigMap.
scale-to-zero-grace-period specifies the time an inactive revision is left
running before it is scaled to zero (min: 30s).
When operating in a stable mode, the autoscaler operates on the average concurrency over the stable window.
stable-window can also be configured in the Revision template as an
Ensure that enable-scale-to-zero is set to
The termination period is the time that the pod takes to shut down after the
last request is finished. The termination period of the pod is equal to the sum
of the values of the
parameters. In the case of this example, the termination period would be 90s.
Concurrency for autoscaling can be configured using the following methods.
Configuring concurrent request limits
target defines how many concurrent requests are wanted at a given time (soft
limit) and is the recommended configuration for autoscaling in Knative.
The default value for concurrency target is specified in the ConfigMap as
This value can be configured by adding or modifying the
autoscaling.knative.dev/target annotation value in the revision template.
containerConcurrency should only be used if there is a clear need to
limit how many requests reach the app at a given time. Using
containerConcurrency is only advised if the application needs to have an
enforced constraint of concurrency.
containerConcurrency limits the amount of concurrent requests are allowed into
the application at a given time (hard limit), and is configured in the revision
containerConcurrency: 0 | 1 | 2-N
1will guarantee that only one request is handled at a time by a given instance of the revision container.
- A value of
2or more will limit request concurrency to that value.
- A value of
0means the system should decide.
If there is no
/target annotation, the autoscaler is configured as if
Configuring scale bounds (minScale and maxScale)
maxScale annotations can be used to configure the minimum
and maximum number of pods that can serve applications. These annotations can be
used to prevent cold starts or to help control computing costs.
maxScale can be configured as follows in the revision template;
spec: template: metadata: autoscaling.knative.dev/minScale: "2" autoscaling.knative.dev/maxScale: "10"
Using these annotations in the revision template will propagate this to
PodAutoscaler objects are mutable and can be further
modified later without modifying anything else in the Knative Serving system.
edit podautoscaler <revision-name>
NOTE: These annotations apply for the full lifetime of a revision. Even when
a revision is not referenced by any route, the minimal pod count specified by
minScale will still be provided. Keep in mind that non-routeable revisions may
be garbage collected, which enables Knative to reclaim the resources.
minScale annotation is not set, pods will scale to zero (or to 1 if
false per the ConfigMap mentioned above).
maxScale annotation is not set, there will be no upper limit for the
number of pods created.
Configuring CPU-based autoscaling
NOTE: You can configure Knative autoscaling to work with either the default KPA or a CPU based metric, i.e. Horizontal Pod Autoscaler (HPA).
You can configure Knative to use CPU based autoscaling instead of the default
request based metric by adding or modifying the
autoscaling.knative.dev/metric values as annotations in the revision
spec: template: metadata: autoscaling.knative.dev/metric: concurrency autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
- Go autoscaling sample
- “Knative v0.3 Autoscaling - A Love Story” blog post
- Kubernetes Horizontal Pod Autoscaler (HPA)
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.