Retry Pattern (Spring Boot + Resilience4j)
1. Introduction
In microservices architecture, services communicate with other services over the network (HTTP, REST, messaging).
Unlike local method calls, network calls can fail frequently due to temporary issues such as:
- Network glitches
- Temporary service downtime
- Database connection issues
- Load balancer switching nodes
- Short service restarts
- Timeout due to high load
Many of these failures are transient (temporary) and can succeed if the request is attempted again.
To handle such failures, distributed systems use the Retry Pattern.
2. What is Retry Pattern
The Retry Pattern automatically retries a failed operation a limited number of times before returning a failure.
Instead of failing immediately after the first error, the system:
- Detects the failure
- Waits for a configured delay
- Retries the operation
- Stops retrying after reaching the maximum attempts
Example flow:
Service A → Service B
Execution:
- Attempt 1 → Fail
- Attempt 2 → Retry
- Attempt 3 → Retry
- Attempt 4 → Success
The client still experiences one logical request, but internally the system made multiple attempts.
3. Why Retry is Needed
Temporary failures are common in distributed systems.
Examples:
| Situation | Retry Helps |
|---|---|
| Temporary network glitch | Yes |
| Short database outage | Yes |
| Service restarting | Yes |
| Load balancer switching nodes | Yes |
Example:
Without retry:
- Order Service → Payment Service (temporary failure)
- Result → Error returned
With retry:
- Order Service → Retry → Payment Service
- Second attempt succeeds
The user never sees the failure.
4. Where Retry is Used
Retry is typically used in external communication or unstable network environments.
Common use cases:
- External API calls
- Payment gateway requests
- Internal microservice communication
- Database connection retry
- Message processing retry
- Network-dependent operations
Example:
Order Service → Payment Service
If the payment service temporarily fails, retry can automatically recover the request.
5. When NOT to Use Retry
Retry should not be used blindly.
Avoid retry in the following scenarios:
| Scenario | Reason |
|---|---|
| Validation errors | Bad input will always fail |
| Permanent failures | Retry will not fix the problem |
| Non-idempotent operations | Can cause duplicate actions |
Example of a dangerous retry scenario:
Create Payment
Retry request
If the first request succeeded but the response was lost:
- Attempt 1 → Payment created
- Attempt 2 → Payment created again
This results in duplicate payments.
6. Retry Terminology
Important retry configuration concepts:
| Term | Meaning |
|---|---|
| maxAttempts | Maximum number of attempts including the first call |
| waitDuration | Delay between retry attempts |
| retryExceptions | Exceptions that trigger retry |
| ignoreExceptions | Exceptions that should not retry |
7. Retry Workflow
Client Request
↓
Method Execution
↓
Failure Occurs
↓
Retry Policy Check
↓
Retry Allowed?
↓
Yes → Wait → Retry
No → Return Failure
8. Retry Implementation in Spring Boot
In Spring Boot microservices, retry is commonly implemented using the Resilience4j Retry module.
Step 1 — Add Dependency
Add Resilience4j starter dependency.
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
</dependency>
This dependency provides multiple resilience features:
- Retry
- Circuit Breaker
- Bulkhead
- Rate Limiter
- TimeLimiter
Step 2 - Configuration in application.yml
Retry configuration is defined under:
resilience4j:
retry:
instances:
paymentService:
maxAttempts: 3
waitDuration: 1s
retryExceptions:
- java.io.IOException
ignoreExceptions:
- java.lang.IllegalArgumentException
This means:
Attempt 1 → Initial call
Attempt 2 → Retry
Attempt 3 → Retry
| Property | Description |
|---|---|
| maxAttempts | Total attempts including first call |
| waitDuration | Delay between retries |
| retryExceptions | Exceptions that trigger retry |
| ignoreExceptions | Exceptions that should not trigger retry |
9. Service Implementation Example
Example service calling another microservice.
@Service
public class PaymentClient {
@Autowired
private RestTemplate restTemplate;
@Retry(name = "paymentService")
public String callPaymentService() {
return restTemplate.getForObject(
"http://payment-service/pay",
String.class
);
}
}
Explanation:
- @Retry enables retry for this method
- name refers to the retry configuration
10. Retry with Fallback
Fallback executes if all retries fail.
@Retry(name = "paymentService", fallbackMethod = "fallback")
public String callPaymentService() {
return restTemplate.getForObject(
"http://payment-service/pay",
String.class
);
}
public String fallback(Exception ex) {
return "Payment service temporarily unavailable";
}
11. Retry Delay Strategies
Retry delay can follow different patterns.
1. Fixed Delay
Each retry waits the same duration.
Example
Attempt 1
Wait 1s
Attempt 2
Wait 1s
Attempt 3
2. Exponential Backoff
Delay increases gradually.
Example:
Attempt 1
Wait 1s
Attempt 2
Wait 2s
Attempt 3
Wait 4s
This prevents system overload.
Example configuration:
resilience4j:
retry:
instances:
paymentService:
maxAttempts: 4
waitDuration: 1s
enableExponentialBackoff: true
exponentialBackoffMultiplier: 2
12. Drawbacks / Disadvantages of Retry
Retry improves reliability but also introduces risks.
1. Duplicate Operations
Retrying non-idempotent operations can create duplicate transactions.
Example:
Create Payment
Retry request
Result:
Payment created twice
2. Retry Storm
If many services retry simultaneously, it can cause a traffic explosion.
Example:
1000 requests
Each retries 3 times
Total load = 3000 requests
This can overload the system.
3. Increased Latency
Retries increase response time.
Example:
Attempt 1 → Fail
Wait 1s
Attempt 2 → Success
Total response time becomes longer.
4. Resource Consumption
Multiple retries consume:
- CPU
- Threads
- Network connections
13. How to Overcome Retry Problems
1. Use Idempotent APIs
Operations should produce the same result if executed multiple times.
Example:
PUT /payment/{id}
Instead of:
POST /payment
2. Use Idempotency Keys
Client sends a unique request ID.
Example:
Idempotency-Key: 123456
Server ensures the request is processed only once.
3. Limit Retry Attempts
Example:
maxAttempts = 3
Avoid very high retry counts.
4. Use Exponential Backoff
Increase delay between retries to avoid traffic spikes.
5. Combine Retry with Circuit Breaker
Typical production flow:
Retry
↓
Circuit Breaker
↓
Service Call
If service keeps failing:
- Circuit Breaker opens
- Retries stop
14. Monitoring Retry
Retry metrics can be monitored using Spring Boot Actuator.
/actuator/metrics/resilience4j.retry.calls
Best Practices
- Retry only transient failures
- Use exponential backoff
- Limit retry attempts
- Avoid retry for non-idempotent operations
- Combine retry with circuit breaker
- Monitor retry metrics