← Index/06 · Industrial IoT · 2024

Polaris
Inference at the edge.

Defined what 'fast enough' meant for a real-time inspection rig — then made it stay that way.

PolarisInfrastructure

✺ — The problem

Polaris runs inspection cameras on manufacturing lines. Their cloud-hosted model was too slow for the conveyor speed. Sending every frame to a GPU farm in another region was costing more than the savings the model was supposed to capture.

Sector

Industrial IoT

Year

2024

Duration

18 weeks

Team

1 Principal · 2 Engineers · 1 SRE

Stack

RustONNXTensorRTGrafanaTailscale

✺ — Approach

The same arc as every engagement — tuned to this problem.

01

Define · The latency budget

We sat next to a working line for two days. The frame budget was 84ms per inspection, end-to-end — round trip, model, response. Anything slower meant rejected parts piled up. That number became the design constraint, not a target.

02

Build · Quantized, on-prem, falls back to cloud

INT8-quantized model running on a ruggedized edge box at every line. A graceful fall-through to cloud inference when the edge is uncertain — so the line never stops, but cloud cost only kicks in for the hard cases.

03

Operate · Observability everywhere

Every edge node exports latency, confidence histograms, and drift metrics. A misbehaving line surfaces in dashboards before a foreman notices the reject rate. New model versions roll out behind a per-line flag.

✺ — Outcome

Three numbers we’d defend in public.

62ms

median end-to-end inspection latency

−71%

monthly inference cost vs. cloud-only

$0

downtime on rollout — six lines, one weekend

Most studios would have sold us a bigger GPU bill. They asked what our latency budget actually was, then shipped something a foreman could rack and forget.

Director of Engineering, Polaris