Rate Limiter (Spring Boot + Resilience4j)

1. Introduction

In microservices architectures, services are exposed through APIs and accessed by multiple clients such as:

Web applications
Mobile apps
External services
Other internal microservices

Sometimes systems receive too many requests in a short period of time, which can cause:

Server overload
Increased latency
Resource exhaustion
Service crashes

To protect systems from excessive traffic, the Rate Limiter Pattern is used.

2. What is Rate Limiter

A Rate Limiter restricts the number of requests a client can make to a service within a specified time period.

Example: Allow 10 requests per second.

If a client sends 15 requests per second:

Allow → 10 requests
Reject/Delay → 5 requests

3. Why Rate Limiting is Needed

Without rate limiting, excessive requests can overwhelm a system.

Example: Client → API Gateway → Order Service

If 10,000 requests arrive suddenly:

CPU overload
Thread exhaustion
Database overload
System crash

Rate limiting prevents this by controlling request flow.

4. Where Rate Limiting is Used

Rate limiting is commonly used in:

Scenario	Example
Public APIs	Limit API usage per client
Login endpoints	Prevent brute force attacks
Payment APIs	Protect transaction systems
Microservice communication	Prevent overload
External integrations	Limit third-party API calls

5. How Rate Limiter Works

Rate limiter tracks incoming requests and allows only a fixed number during a time window.

Example configuration:

Limit = 5 requests / second

Requests: 1 → Allowed
2 → Allowed
3 → Allowed
4 → Allowed
5 → Allowed
6 → Rejected

The counter resets after the time window.

6. Rate Limiting Algorithms

Different algorithms can implement rate limiting.

Fixed Window

Requests counted in fixed time window.

Example:

Window = 1 second
Limit = 5 requests

Problem:

Requests near window boundaries can exceed limits.

Sliding Window

Counts requests over a moving time window for smoother traffic control.

Example:

Last 1 second window Limit: 5 requests

More accurate and smoother traffic control.

Token Bucket

Tokens generated at fixed rate.

Each request consumes one token.

Example: 5 tokens generated per second.

If token available → request allowed
If no token → request rejected

Allows burst traffic.

Leaky Bucket

Requests enter a queue and are processed at a constant rate.

Incoming → Queue → Process steadily

Prevents sudden bursts.

7. Rate Limiter in Spring Boot (Resilience4j)

Step 1 — Add Dependency

<dependency>
 <groupId>io.github.resilience4j</groupId>
 <artifactId>resilience4j-spring-boot3</artifactId>
</dependency>

Step 2. Configuration (application.yml)

resilience4j:
  ratelimiter:
    instances:
      paymentService:
        limitForPeriod: 5
        limitRefreshPeriod: 1s
        timeoutDuration: 0

Property	Meaning
limitForPeriod	Maximum number of requests allowed
limitRefreshPeriod	Time period for rate limit reset
timeoutDuration	Time a request waits for permission

Meaning:

Allow 5 requests per second.

8. Service Implementation Example

Example service with rate limiter.

@Service
public class PaymentClient {

    @RateLimiter(name = "paymentService", fallbackMethod = "fallback")
    public String callPaymentService() {

        return restTemplate.getForObject(
                "http://payment-service/pay",
                String.class
        );
    }

    public String fallback(Exception ex) {
        return "Too many requests. Please try later.";
    }
}

Explanation:

@RateLimiter restricts request rate
name matches configuration
fallbackMethod handles rejected requests

9. Execution Flow

Client Request
      ↓
Rate Limiter
      ↓
Limit available?
      ↓
Yes → Call service
No → Reject / fallback

Request 1 → Allowed
Request 2 → Allowed
Request 3 → Allowed
Request 4 → Allowed
Request 5 → Allowed
Request 6 → Rejected

After 1 second:

Counter resets
New requests allowed

10. Drawbacks

1. Request Rejection

Valid requests may be rejected when traffic spikes.

Example:

User sends legitimate request
But system limit exceeded
Request rejected

2. Increased Latency

If requests wait for permission:

If timeoutDuration > 0 ; requests may wait.

3. Configuration Complexity

Incorrect limits may block users or overload services.

Block too many requests
Allow too much traffic

User Experience Issues

Users may see HTTP 429 Too Many Requests.

11. How to Overcome These Issues

Choose proper limits based on system capacity.

Example:

limitForPeriod = 100
limitRefreshPeriod = 1s

Should match system capacity.

Apply rate limiting at API Gateway.

Instead of applying rate limits in every service, apply them at:

API Gateway

Examples:

Kong
NGINX
Spring Cloud Gateway

Provide graceful fallback messages.

Too many requests. Please retry after some time.

Use client‑side exponential backoff.
Monitor rate limiter metrics.

12. Monitoring

Using Spring Boot Actuator:

Expose endpoints:

management:
  endpoints:
    web:
      exposure:
        include: "*"

Metrics:

/actuator/metrics/resilience4j.ratelimiter.calls

13. Best Practices

Apply limits at API gateway first
Add service-level limits for critical APIs
Monitor traffic patterns
Combine with Circuit Breaker and Retry

Example architecture:

Client → RateLimiter → Retry → CircuitBreaker → Service

14. Summary

Rate Limiter is a resilience pattern used to control how many requests a service can process within a given time window to protect systems from overload and ensure fair resource usage.

Key properties:

limitForPeriod
limitRefreshPeriod
timeoutDuration