Traxium is our AI fleet management platform for Indian trucking SMEs. It handles live GPS at sub-30-second refresh, AI-based delay prediction, fuel anomaly detection, GST-compliant invoicing, and WhatsApp alerts — across hundreds of fleets and thousands of vehicles. This post is the engineering story behind it: the architecture decisions, the trade-offs, what we got right, and what we'd do differently if we started over.

If you're building a real-time multi-tenant SaaS platform — fleet, IoT, marketplace, logistics — there's something here worth borrowing.

The problem shape

Fleet management for Indian trucking has a specific set of constraints that drove most of our architecture decisions:

  • High-volume telemetry ingestion — every GPS device sends a packet every 10–30 seconds. A fleet of 1,000 trucks generates ~3 million events per day. We needed to scale that across many fleets cheaply.
  • Multi-tenant isolation — each fleet's data must be cleanly separated, with no chance of cross-tenant leakage.
  • Real-time dashboards — operators expect to see truck positions update live, not on 60-second polling.
  • India compliance — GST invoicing, e-way bill integration, IRN generation. Native, not bolted on.
  • WhatsApp as a first-class channel — drivers and customers don't open new apps.
  • Spotty connectivity — devices go offline on long-haul routes. Data must buffer and sync on reconnect, in order.
  • India-market cost sensitivity — we couldn't justify ₹2L/month cloud spend per customer.

Those constraints ruled out a lot of naive architectures. Here's what we landed on.

The 30,000-foot view

Traxium's architecture in one paragraph: GPS devices send raw packets to a TCP/UDP ingestion layer running on ECS. Packets are parsed, validated, and pushed to Kinesis. Stream processors fan packets out to (a) a hot-storage time-series database for live tracking, (b) a data lake for analytics and ML, (c) the event bus for triggering alerts and workflows. The application backend runs as Node.js services in containers behind an Application Load Balancer. The frontend is a React SPA. ML models for delay prediction and fuel anomaly run as scheduled batch jobs against the data lake. WhatsApp integration goes through the Meta Cloud API.

The detail is where the interesting trade-offs are.

1. GPS ingestion: don't try to use HTTP

The most important early decision: GPS devices use proprietary TCP/UDP protocols, not HTTP. Each device manufacturer has its own packet format (Teltonika, Concox, Meitrack, Aquila, JT701, all different). Trying to put devices behind an HTTP layer adds latency, complexity, and a translation layer that breaks every time a manufacturer ships firmware updates.

What we built: a thin ingestion service per protocol, running as a containerised TCP listener on ECS Fargate, behind a Network Load Balancer. Devices connect directly. The service parses, validates, and forwards to Kinesis. Each protocol's service is independent — if Teltonika ships a new firmware, only the Teltonika service needs updating.

Why this matters: protocol-specific services scale independently. The Teltonika service handles 60% of our load; the long-tail protocols share a single low-capacity service. We don't pay for capacity we don't need on niche protocols.

2. Hot vs cold storage

The next critical decision: where does telemetry data live?

Initially we tried PostgreSQL with TimescaleDB for everything. It worked at small scale, then hit a wall around 50 fleets. Writes were fine; reads for live dashboards became expensive. The geometry indexes for spatial queries struggled.

We split storage by access pattern:

  • Hot storage (last 7 days): moved to TimescaleDB on a tuned RDS instance. Specifically designed for time-series queries. Used for live tracking and recent reports.
  • Cold storage (>7 days): Parquet files on S3 in date-partitioned folders, queried via Athena. Cheap, fast for analytics, terrible for transactional reads — which is fine because nothing transactional reads cold data.
  • Operational data (trips, invoices, customers, users): regular PostgreSQL on RDS. Standard relational model.

This split cut our storage costs by ~70% while improving query performance for the live dashboards.

3. Multi-tenant isolation — pick a model and commit

SaaS multi-tenancy options, ranked by trade-off:

  • Database per tenant: strongest isolation, hardest to operate, very expensive at scale
  • Schema per tenant: good isolation, manageable, breaks at high tenant count
  • Shared schema with tenant_id column: cheapest, hardest to keep isolation watertight

We went with shared schema + strict tenant_id discipline + row-level security in PostgreSQL. Every query goes through an ORM layer that automatically injects the tenant filter. RLS policies on the database side enforce it as a backstop. Cross-tenant queries simply can't happen without explicit super-admin context.

This was the right call for our scale and ops capacity. If we were targeting enterprise customers demanding dedicated infrastructure, schema-per-tenant would have made more sense. For SME-focused SaaS, shared-schema is the only economically viable model.

4. Real-time dashboards: WebSockets, not polling

The mistake everyone makes early: polling. The dashboard polls the API every 5 seconds for truck positions. The API hammers the database. Costs explode. UX feels laggy anyway.

We use WebSockets via Socket.io behind ALB. The stream processor publishes events to a pub/sub layer (Redis). The WebSocket service subscribes per-tenant and pushes updates to connected dashboards. Truck moves → 200ms → it's visible on the dashboard.

Cost-wise: WebSocket connections are cheap. The bandwidth for real-time updates is much less than the bandwidth wasted on polling that returns the same data.

5. AI delay prediction — start simple

Our biggest temptation early was to build a fancy deep-learning model for delay prediction. We resisted, and we're glad we did.

The v1 model: gradient-boosted regression on a hand-crafted feature set — current speed vs route average, weather at destination, hour-of-day patterns, historical truck behaviour, traffic forecasts. Built in scikit-learn. Trained nightly on the previous 30 days of data. Predicted ETA delays in 15-minute buckets.

That model is now responsible for the bulk of our "AI" value. It's interpretable, fast to train, cheap to run, and produces results good enough that customers cite the delay-prediction feature as their reason for staying.

Lesson: simple models with good features beat complex models with bad features. Especially when you're shipping a SaaS where users want results, not novelty.

6. GST and IRN — the special-case backend

GST integration is its own small empire. IRN generation requires hitting the IRP (Invoice Registration Portal) within strict timeframes. Failed IRNs need replay logic. Different state-of-supply rules apply per consignor/consignee. Cancellation has its own protocol.

We isolated this into a dedicated invoicing service with its own queue, retry logic, and reconciliation processes. It exposes a simple internal API ("issue invoice for trip X") and handles everything underneath. Critically, it has its own dashboard for ops to monitor failed IRNs and manually retry.

Isolating this away from the main app code paid off the first week we tried it. GST is fiddly enough that you don't want it bleeding into your main business logic.

7. WhatsApp as a first-class channel

The Indian context: drivers, customers, and dispatchers live on WhatsApp. Pushing them to install separate apps is friction we couldn't justify.

We integrated the Meta WhatsApp Cloud API directly. Templated message flows for status updates, ETAs, document requests, customer notifications. Inbound message handling for driver POD uploads (photo + GPS = automatic delivery confirmation). Opt-out controls to respect WhatsApp Business policy.

The architectural decision: WhatsApp is treated as a peer to in-app notifications, not a secondary channel. Notification templates are channel-agnostic; the dispatch layer picks the right channel per user preference.

8. What we'd do differently

Several things, in retrospect:

  • Start with event sourcing for trip events from day one. We added it later for audit and analytics. Earlier would have saved a painful migration.
  • Containerise everything from week one. We had some long-lived EC2 instances early that became technical debt. ECS Fargate everywhere would have been cleaner.
  • Build the protocol-translation layer more generically. Adding a new GPS device type still requires a small dev effort. A more declarative DSL would have made this near-config-only.
  • Take observability seriously from the start. We didn't have proper distributed tracing for the first year. Debugging issues across services was painful. OpenTelemetry from day one would have helped.

The general lessons

If we were starting any real-time multi-tenant SaaS today, the lessons we'd carry over:

  1. Protocol-specific ingestion services beat universal HTTP wrappers
  2. Split storage by access pattern — hot, cold, operational — early
  3. Shared-schema multi-tenancy with RLS is the right default for SME-scale SaaS
  4. WebSockets for live dashboards. Always.
  5. Start with simple, interpretable ML models. Add complexity only when results plateau.
  6. Isolate compliance-heavy modules (GST, billing, regulated workflows) from main business logic
  7. Treat WhatsApp / SMS as first-class channels in India, not afterthoughts
  8. Containerise everything. ECS Fargate or Kubernetes from day one.
  9. Observability — tracing, metrics, logs — from day one, not month 12

The full Traxium story has more to it — the AI fuel anomaly detection, the offline-first mobile apps, the WhatsApp-first POD workflow — but those are posts for another day.

If you want to see what Traxium actually does as a product, see our Traxium guide. If you're building something similar and want to compare notes, talk to us — engineering conversations are the best part of our day.