We use analytics and cookies to understand site traffic. Information about your use of our site is shared with Google for that purpose. Learn more.
Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.
For global concurrency, you can set the
Soft versus hard concurrency limits
It is possible to set either a soft or hard concurrency limit.
NOTE: If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.
The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden burst of requests, this value can be exceeded.
The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.
IMPORTANT: Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low hard limit specified may have a negative impact on the throughput and latency of an application, and may cause additional cold starts.
- Global key:
- Per-revision annotation key:
- Possible values: An integer.
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go namespace: default spec: template: metadata: annotations: autoscaling.knative.dev/target: "200"
apiVersion: v1 kind: ConfigMap metadata: name: config-autoscaler namespace: knative-serving data: container-concurrency-target-default: "200"
apiVersion: operator.knative.dev/v1alpha1 kind: KnativeServing metadata: name: knative-serving spec: config: autoscaler: container-concurrency-target-default: "200"
The hard limit is specified per Revision using the
containerConcurrency field on the Revision spec. This setting is not an annotation.
There is no global setting for the hard limit in the autoscaling ConfigMap, because
containerConcurrency has implications outside of autoscaling, such as on buffering and queuing of requests. However, a default value can be set for the Revision’s
containerConcurrency field in
The default value is
0, meaning that there is no limit on the number of requests that are allowed to flow into the revision.
A value greater than
0specifies the exact number of requests that are allowed to flow to the replica at any one time.
Per-revision spec key:
Possible values: integer
0, meaning no limit
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go namespace: default spec: template: spec: containerConcurrency: 50
apiVersion: v1 kind: ConfigMap metadata: name: config-defaults namespace: knative-serving data: container-concurrency: "50"
apiVersion: operator.knative.dev/v1alpha1 kind: KnativeServing metadata: name: knative-serving spec: config: defaults: container-concurrency: "50"
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.