scaled server for high traffic app around 25k hits daily

Scaled Server for High Traffic App around 25k hits daily

They needed someone to fix speed, stop crashes, and make system strong for future growth — without spending too much money.

My Role I joined as backend dev. I studied their current setup, found weak points, and made big improvements step by step. Here is exactly what I did and why.

Tech Stack they were using.

Backend: Python (FastAPI/Django) + some Node.js microservices
Frontend: React (web) + Flutter (mobile apps)
Database: PostgreSQL
Other: Redis(later i implemented this), Celery/RQ, Docker, AWS/GCP cloud

What I Did – Step by Step (with simple explanation)

Made Application Stateless (Very Important) Old problem: Each server was saving user session in its own memory. If that server died, user got logged out. Also hard to add more servers. Solution:
- Removed session from server memory
- Used Redis to store sessions (very fast)
- Switched to JWT tokens for mobile app login (no session needed) Result: Now we can start/stop any number of servers anytime. Traffic comes — auto add more servers — no problem.
Put All Static Files on CDN Old way: Images, CSS, JS, videos were coming from same server. Very heavy load. Solution:
- Moved everything static (photos, videos, app icons, JS bundles) to CDN (Cloudflare or AWS CloudFront)
- CDN is present all over India and world — files load in 50–100 ms instead of 1–2 seconds Result: Main server load reduced by 40–50%. Pages open much faster.
Added Smart Caching with Redis Many things don’t change every second (like user profile, product list, settings). Solution:
- Saved this data in Redis (super fast memory store)
- Set time limit (TTL) — example: user profile cache for 30 minutes, product list for 30 minutes, but price is live.
- When user updates profile, we clear old cache Result: 80–90% of read requests now answered from cache, not database. Database load dropped a lot.
Fixed Database Problems (PostgreSQL) Old issues: What I did:
- Too many connections from app → database hanged
- All reads and writes going to same DB → slow
- Some queries very slow because no proper index
- Added PgBouncer → it manages connections. Now only few real connections to DB, rest wait in pool.
- Added 1 read replica → all read queries (like show user data, feed, search) go to replica. Writes still go to main DB.
- Found slow queries using logs → added missing indexes → removed SELECT * → fixed N+1 problem Result: Database became 3–5x faster. No more connection errors.
- Now the database is on another VPC with load balancers
Moved Heavy Work to Background Old problem: When user uploads photo → server processes it (resize, thumbnail) → user waits 10–20 seconds. Same for sending email, notifications, reports.Solution:
- Used Celery (with Redis or RabbitMQ as queue)
- When user uploads → API says "success" in 1 second → processing happens in background worker Result: API always fast. User happy. Even if 1000 uploads at same time — no slow down.
Added Rate Limiting & Protection Some bad users or bots sending 1000 requests per second → server dies.Solution:
- Added rate limit in API gateway / NGINX
- Example: max 100 requests per minute per IP / per user
- Also added basic WAF (Web Application Firewall) rules Result: Stopped attack traffic. Real users never affected.
Added Monitoring So We Can See Problems Fast Solution:
- Used Prometheus + Grafana dashboard
- Watching: CPU, memory, request per second, error rate, database connections, cache hit %, slow queries
- Set alerts on WhatsApp/Slack/email if p95(we are removing this anytime soon it's costing too much and complicated) latency > 500ms or errors > 1% Result: We know problem before users complain.
Enabled Autoscaling Solution:
- Put all services in Docker containers
- Used cloud autoscaling (AWS ECS / GCP Cloud Run / Kubernetes)
- Rule: If CPU > 70% or latency high → add more servers automatically Result: During 10x spike (like christmas sale), system auto handles without manual work. (working to add more checks to this rule to save more cost)

What I Delivered in First 15–20 Days (Quick Wins)

Static files → CDN
Redis cache for hot data
PgBouncer + 1 read replica
Background jobs for email & image processing
Basic rate limiting
Simple monitoring dashboard + alerts

After these quick changes → speed improved 4–5x, no crashes in spikes.

Final Result

Handled 25,000 daily visits + 10x sudden spikes easily
Page load time reduced from 4–6 seconds to under 1–2 seconds
Zero downtime during big traffic days
Database stable, no more "too many connections" error
Client very happy — they could run bigger marketing without fear
Cost increase was small because we used autoscaling (pay only when traffic high)

This project taught me real scaling is not about big tools, we are still migrating the VPC and working on techniques to reduce server cost at high load.

Scaled Server for High Traffic App around 25k hits daily

Ready to build something extraordinary?