Rate Limiting
Rate limiting is an effective mechanism to control the throughput of traffic destined to a target host. It puts a cap on how often downstream clients can send network traffic within a certain timeframe.
Most commonly, when a large number of clients are sending traffic to a target host, if the target host becomes backed up, the downstream clients will overwhelm the upstream target host. In this scenario it is extremely difficult to configure a tight enough circuit breaking limit on each downstream host such that the system will operate normally during typical request patterns but still prevent cascading failure when the system starts to fail. In such scenarios, rate limiting traffic to the target host is effective.
OSM supports two forms of rate limiting:
Local per-instance rate limiting
at each upstream hostGlobal rate limiting
using a global gRPC rate limiting service
OSM leverages its UpstreamTrafficSetting API to configure rate limiting attributes for traffic directed to an upstream service. We use the term upstream service
to refer to a service that receives connections/requests from clients and returns responses.
Each UpstreamTrafficSetting
configuration targets an upstream host defined by the spec.host
field. For a Kubernetes service my-svc
in the namespace my-namespace
, the UpstreamTrafficSetting
resource must be created in the namespace my-namespace
, and spec.host
must be an FQDN of the form my-svc.my-namespace.svc.cluster.local
.
Configuring local per-instance rate limiting
The specification enables configuring local rate limiting attributes for an upstream service at the connection and request level. OSM leverages Envoy’s local rate limiting capability to implement per-instance local rate limiting at each upstream host.
Local rate limiting is applicable at both the TCP (L4) connection and HTTP request level, and can be configured using the rateLimit.local
attribute in the UpstreamTrafficSetting
resource. TCP settings apply to both TCP and HTTP traffic, while HTTP settings only apply to HTTP traffic. Both TCP and HTTP level rate limiting is enforced using a token bucket rate limiter.
Local rate limiting of TCP connections
TCP connections can be rate limited per unit of time. An optional burst limit can be specified to allow a burst of connections above the baseline rate to accommodate for connection bursts in a short interval of time. TCP rate limiting is applied as a token bucket rate limiter at the network filter chain of the upstream service’s inbound listener. Each incoming connection processed by the filter consumes a single token. If the token is available, the connection will be allowed. If no tokens are available, the connection will be immediately closed.
The following attributes nested under spec.rateLimit.local.tcp
define the rate limiting attributes for TCP connections:
-
connections
: The number of connections allowed per unit of time before rate limiting occurs on all backends belonging to the upstream host specified via thespec.host
field in theUpstreamTrafficSetting
configuration. This setting is applicable to both TCP and HTTP traffic. -
unit
: The period of time within which connections over the limit will be rate limited. Valid values aresecond
,minute
andhour
. -
burst
: The number of connections above the baseline rate that are allowed in a short period of time.
Refer to the TCP local rate limiting API for additional information regarding API usage.
Local rate limiting of HTTP requests
HTTP requests can be rate limited per unit of time. An optional burst limit can be specified to allow a burst of requests above the baseline rate to accommodate for request bursts in a short interval of time. HTTP rate limiting is applied as a token bucket rate limiter at the virtual host and/or HTTP route level at the upstream backend, depending on the rate limiting configuration. Each incoming request processed by the filter consumes a single token. If the token is available, the request will be allowed. If no tokens are available, the request will receive the configured rate limit status.
HTTP request rate limiting can be configured at the virtual host level by specifying the rate limiting attributes nested under the spec.rateLimit.local.http
field. Alternatively, rate limiting can be configured per HTTP route allowed on the upstream backend by specifying the rate limiting attributes as a part of the spec.httpRoutes
field. It is important to note that when configuring rate limiting per HTTP route, the route matches an HTTP path that has already been permitted by a service mesh policy, otherwise the rate limiting policy will be ignored.
The following rate limiting attributes can be configured for HTTP traffic:
-
requests
: The number of requests allowed per unit of time before rate limiting occurs on all backends belonging to the upstream host specified via thespec.host
field in theUpstreamTrafficSetting
configuration. -
unit
: The period of time within which requests over the limit will be rate limited. Valid values aresecond
,minute
andhour
. -
burst
: The number of requests above the baseline rate that are allowed in a short period of time. -
responseStatusCode
: The HTTP status code to use for responses to rate limited requests. Code must be in the 400-599 (inclusive) error range. If not specified, a default of 429 (Too Many Requests) is used. The code must be a status code supported by Envoy. -
responseHeadersToAdd
: The list of HTTP headers as key-value pairs that should be added to each response for requests that have been rate limited.
Configuring global rate limiting
The specification enables configuring global rate limiting attributes for an upstream service at the connection and request level. OSM leverages Envoy’s global rate limiting capability to implement rate limiting by making calls to a global gRPC Rate Limit Service (RLS). Any service that implements the defined RPC/IDL protocol can be used as the global rate limit service.
Global rate limiting is applicable at both the TCP (L4) connection and HTTP (L7) request level, and can be configured using the rateLimit.global
attribute in the UpstreamTrafficSetting
resource. TCP settings apply to both TCP and HTTP traffic, while HTTP settings only apply to HTTP traffic. Both TCP and HTTP level rate limiting is enforced by calling the rate limit service for every new connection/request to make a rate limit decision.
Unlike local rate limit configuration, global rate limit configuration does not directly define a rate limit policy for the target upstream host. Instead, global rate limiting is enforced by specifying a descriptor set in the UpstreamTrafficSetting
configuration. A descriptor set is a list of hierarchical entries that are used by the Rate Limit Service (RLS) to determine the final rate limit key and overall allowed limit. The specified set of descriptors will be generated and sent to the RLS for each connection/request to make a rate limit decision.
Global rate limiting of TCP connections
TCP connections from downstream clients inbound on an upstream host can be rate limited using a global rate limiter service. The configuration specifies a specific domain and descriptor set to rate limit on. This has the ultimate effect of rate limiting connections per second that transit the upstream host’s inbound listener.
Envoy supports configuring static key-value pairs as descriptor entries to use in the rate limit service request for TCP connections.
For example, to rate limit TCP connections on the foo.demo.svc.cluster.local
service in the demo
namespace using an external RLS ratelimiter.rls.svc.cluster.local
serving requests on port 8081
, for the descriptor entry my_key: my_value
and rate limit domain test
will look as follows:
apiVersion: policy.openservicemesh.io/v1alpha1
kind: UpstreamTrafficSetting
metadata:
name: foo
namespace: demo
spec:
host: foo.demo.svc.cluster.local
rateLimit:
global:
tcp:
rateLimitService:
host: ratelimiter.rls.svc.cluster.local
port: 8081
domain: test
failOpen: false
timeout: 10s
descriptors:
- entries:
- key: my_key
value: my_value
Refer to the Global TCP rate limit API to learn more about the configuration attributes and an end-to-end demo to understand the global TCP rate limiting capability.
Global rate limiting of HTTP requests
HTTP connections from downstream clients inbound on an upstream host can be rate limited using a global rate limit service. The configuration specifies a specific domain and descriptor set to rate limit on. This has the ultimate effect of rate limiting requests per second that transit the upstream host’s inbound listener.
The rate limit configuration can be applied at both the virtual host and route level. The policy defines a set of request descriptors that will be generated and sent to the external RLS to make a rate limit decision on each request. A list of descriptors, each comprising of one or more descriptor entries, is specified and generated based on different criteria. If an entry specified within a descriptor cannot be generated for a request, the entire descriptor is not generated. When multiple descriptors are specified, all descriptors that can be generated will be generated and sent to the rate limit service. Refer to the Envoy documentation on composing descriptors for more information.
OSM supports different kinds of descriptor entries for HTTP requests, namely genericKey
, remoteAddress
, requestHeader
, and headerValueMatch
. The following sections describe each of them.
genericKey descriptor
The genericKey descriptor entry defines a static key-value pair. By default, the genericKey
descriptor entry uses generic_key
as it’s descriptor key if unspecified.
For example, the following configuration generates the descriptor ("my_key", "my_value")
:
rateLimit:
global:
http:
descriptors:
- entries:
- genericKey:
key: my_key
value: my_value
Refer to the genericKey descriptor demo to learn more about using the genericKey
descriptor entry.
remoteAddress descriptor
The remoteAddress descriptor entry has a key of remote_address
and a value of the client IP address that is populated using the trusted address from the x-forwarded-for
HTTP header.
For example, the following configuration generates the descriptor ("remote_address", "<trusted address from x-forwarded-for>")
:
rateLimit:
global:
http:
descriptors:
- entries:
- remoteAddress: {}
Refer to the remoteAddress descriptor demo to learn more about using the remoteAddress
descriptor entry.
requestHeader descriptor
The requestHeader descriptor entry defines a static key-value pair for the descriptor entry that is generated only when the request header matches the given header name. The value of the descriptor entry is derived from the value of the header present in the request. If the header is not present, the descriptor is not generated.
For example, the following configuration generates the descriptor ("my_header", "<value from some-header>")
:
rateLimit:
global:
http:
descriptors:
- entries:
- requestHeader:
name: some-header
key: my_header
Refer to the requestHeader descriptor demo to learn more about using the requestHeader
descriptor entry.
headerValueMatch descriptor
The headerValueMatch descriptor entry defines a descriptor entry that is generated when the request header matches the given set of HTTP header match criteria. OSM supports multiple header match operators in the form of Exact, Prefix, Suffix, Regex, Contains, Present
match semantics.
For example, the following configuration generates the descriptor ("header_match", "foo")
for requests that have the header my-header
present and that don’t have the header other-header
present.
rateLimit:
global:
http:
descriptors:
- entries:
- headerValueMatch:
key: header_match
value: foo
headers:
- name: my-header
present: true
- name: other-header
present: false
Demos
To learn more about configuring rate limting, refer to the following demo guides:
- Local rate limiting of TCP connections
- Local rate limiting of HTTP requests
- Global rate limiting of TCP connections
- Global rate limiting of HTTP requests
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.