Kafka Lag


1. What is Lag?

Lag = difference between produced messages and consumed messages


Formula

Lag = Latest Offset (in partition) - Consumer Offset


Example

Partition 0:

Offset 0 → Event A
Offset 1 → Event B
Offset 2 → Event C
Offset 3 → Event D
Offset 4 → Event E

Consumer processed till:

Offset 2

Lag = 4 - 2 = 2

Pending messages: D, E


2. Consumer Lag (Actual Kafka Concept)

Consumer lag means:

How many messages are waiting to be processed by a consumer

Visualization

Producer → writes till offset 10
Consumer → processed till offset 6

Lag = 10 - 6 = 4

Real Meaning

  • Consumer is behind
  • Messages are waiting
  • System may be slow or overloaded

Where Lag is Stored

Kafka tracks:

  • Latest offset → in partition\
  • Consumer offset → in __consumer_offsets

Kafka CLI — Check Lag

kafka-consumer-groups.sh   --bootstrap-server localhost:9092   --describe   --group order-group

Output Example

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
order.created.v1 0 5 10 5

Meaning

  • CURRENT-OFFSET → Consumer processed till\
  • LOG-END-OFFSET → Latest message in Kafka\
  • LAG → Pending messages

3. Why Consumer Lag Happens


1. Slow Consumer

  • Heavy processing
  • DB calls
  • External APIs

2. High Producer Rate

Producer sending 1000 msgs/sec
Consumer processing 100 msgs/sec

Lag increases continuously

3. Consumer Down

Producer continues
Consumer not running

Lag keeps growing

4. Errors / Retries

  • retries
  • DLT flow
  • failures slow down consumption

4. Is Lag Always Bad?

Not Always

  • Small lag → Normal
  • Temporary spike → Acceptable
  • Continuously increasing → Problem

Healthy System

Lag increases → decreases → stable

Problem System

Lag continuously increasing


5. Producer Lag (Clarification)

Kafka does NOT officially define “producer lag”

What People Mean by Producer Lag

Case 1 — Producer Send Delay

  • network delay
  • batching delay

Case 2 — Waiting for ACK

acks=all → producer waits for replicas

Case 3 — Buffer Full

buffer.memory full → producer blocks

Important Point

Producer lag is not a Kafka metric Consumer lag is the real metric


6. Real Production Example

  • Order Service → produces 1000 events/sec
  • Payment Service → consumes 200 events/sec

Lag increases by 800/sec

After 10 seconds: Lag = 8000


7. How to Reduce Lag

1. Scale Consumers

More instances → parallel processing

2. Increase Partitions

More partitions → more parallelism

3. Optimize Processing

  • reduce DB calls
  • async processing
  • batch processing

4. Efficient Serialization

  • avoid heavy payloads
  • use JSON/Avro efficiently

5. Retry Optimization

  • non-blocking retries
  • retry topics

8. Lag in Multi-Partition Scenario

Partition 0 → Lag = 5
Partition 1 → Lag = 2
Partition 2 → Lag = 8

Total lag = 15


9. Monitoring Lag

Tools:

  • Kafka CLI
  • Prometheus + Grafana
  • Kafka UI tools

Why Lag Monitoring is Critical

Lag shows:

  • system health
  • consumer performance
  • bottlenecks
  • scaling needs

10. Real Failure Scenario

Consumer down for 1 hour

Producer continues

After restart:

Consumer processes backlog
Lag reduces gradually


11. Key Takeaways

Consumer Lag

  • Real Kafka metric
  • Measures backlog
  • Must monitor

Producer Lag

  • Not official
  • Means delay in sending

Important Rule

Kafka does not lose data because of lag
It only delays processing


Final Understanding

Lag answers:

“How far behind is my consumer?”


This site uses Just the Docs, a documentation theme for Jekyll.