Re: Implementing Efficient Caching Strategies
Building on ByteMaster's point about Redis, I've found its pub/sub capabilities very powerful for cache invalidation. When data changes, you can publish a message to a Redis channel, and your backend services can subscribe to that channel to invalidate relevant cache entries.
For cache stampedes (thundering herd problem), a common pattern is to use locking. When a cache miss occurs, acquire a lock for that specific resource. If the lock is acquired, fetch the data and update the cache. If another process already holds the lock, it waits for the cache to be populated and then tries to read from the cache.
A simple pseudo-code for a locked cache:
import redis
import time
r = redis.Redis(decode_responses=True)
LOCK_EXPIRY = 10 # seconds
def get_or_set_with_lock(key, fetch_func, expiry_time):
value = r.get(key)
if value:
return json.loads(value)
lock_key = f"lock:{key}"
if r.set(lock_key, "locked", nx=True, ex=LOCK_EXPIRY):
try:
data = fetch_func()
r.set(key, json.dumps(data), ex=expiry_time)
return data
finally:
r.delete(lock_key)
else:
# Another process is fetching, wait and retry
time.sleep(0.1)
return get_or_set_with_lock(key, fetch_func, expiry_time)
# Usage:
# def fetch_user_profile(userId):
# # ... fetch from DB ...
# return user_profile_data
#
# user_data = get_or_set_with_lock(f"user:{userId}", lambda: fetch_user_profile(userId), 600)
This ensures only one process fetches the data at a time.