Jacob Smythe · Portfolio
← Back to projects

published

City Pulse: Debugging and Hardening a Local-First Traffic Vision Stack

A build and debugging retrospective covering pipeline failures, YOLO container fixes, runtime tuning, and pragmatic counting improvements for a local-first traffic analytics stack.

Published May 3, 2026

  • mlops
  • computer-vision
  • streamlit
  • docker
  • timescaledb
  • redis

City Pulse went through a full hardening cycle: from “video plays but chart is flat” to a reproducible local stack with practical debugging controls and better counting behavior.

Objective

Build a local-first MLOps traffic pipeline on public Maryland DOT HLS streams:

  1. Sample frames from HLS.
  2. Queue frames in Redis.
  3. Run YOLO inference through MLServer and aiSSEMBLE inference runtime.
  4. Write counts to TimescaleDB.
  5. Show trends and operational state in Streamlit.
  6. Optionally generate daily text briefs with Sumy.

Tech Stack

  • Python 3.12 (local venv)
  • Redis
  • TimescaleDB (Postgres)
  • MLServer
  • aissemble-inference-yolo
  • aissemble-inference-sumy
  • Docker Compose
  • Streamlit
  • Public MDOT HLS (playlist.m3u8)

Core processes:

  • city-pulse-ingest
  • city-pulse-vision-worker
  • streamlit run src/city_pulse/dashboard/app.py
  • Docker services: redis, timescaledb, yolo (optional sumy)

Architecture

A key lesson: preview video and chart data are separate paths. The video can look live while the pipeline is not writing a single row.

flowchart TB
    A["MDOT HLS\nplaylist.m3u8"] --> B["Ingest\ncity-pulse-ingest"]
    B --> C["Redis Queue"]
    C --> D["Vision Worker\ncity-pulse-vision-worker"]
    D --> E["YOLO Runtime\nMLServer + aiSSEMBLE"]
    E --> F["Vehicle Counts\nTimescaleDB"]
    F --> G["Streamlit Dashboard"]

    B -. "sample interval + JPEG quality" .-> C
    D -. "ROI + temporal IoU dedup" .-> F
    D -. "debug overlay payload" .-> H["Redis Overlay"]
    H --> G

What Broke and How I Fixed It

1) Live preview worked, chart stayed flat

Symptoms:

  • Stream preview looked healthy.
  • Chart was empty or flat.

Root causes:

  • Inference service not healthy or not loaded.
  • Weak visibility into ingest and vision failures.
  • Camera key mismatch between seeded data and live camera.

Fixes:

  • Added scripts/preflight_stack.py for Redis ping, Postgres check, YOLO readiness polling (/v2/health/ready), and camera/data hints.
  • Updated one-command runner to emit logs/ingest.log and logs/vision.log.
  • Clarified preview-vs-pipeline behavior in docs and startup messaging.

2) YOLO container started, then disappeared

Symptoms:

  • docker compose ps only showed Redis and TimescaleDB.
  • :8080 health checks failed.

Root-cause sequence and fixes:

  1. YOLO image copied full models/ (including sumy/), causing ModuleNotFoundError: aissemble_inference_sumy. Fix: copy only models/yolov8/ in YOLO Dockerfile.
  2. Missing ultralytics in YOLO image. Fix: add ultralytics to image requirements.
  3. OpenCV GUI dependency failure on slim image (ImportError: libxcb.so.1). Fix: switch to opencv-python-headless.

Result: YOLO became healthy and served inference.

3) Counts looked low relative to visible traffic

Why:

  • Sparse sampling vs full frame rate.
  • Detection misses on compressed HLS (small/far/night objects).
  • Confidence threshold too strict.
  • yolov8n traded recall for speed.

Fixes and tuning:

  • Added explanatory dashboard copy.
  • Added 1-minute buckets and smoothed trend view.
  • Added runtime sampling controls (Redis-backed).
  • Added visual debug overlay.
  • Upgraded model to yolov8s.pt.
  • Tuned defaults: lower confidence, faster sampling, higher JPEG quality.

4) Overlay toggle did nothing in early runs

Root causes:

  • Worker overlay flag not enabled in some runs.
  • UI process state and worker restart state drifted.

Fixes:

  • Enabled VISION_DEBUG_OVERLAY_ENABLED.
  • Restarted worker and stack with updated env.
  • Added UI misconfiguration warnings.
  • Verified overlay payload in Redis.
  • Added periodic overlay fragment refresh.

5) Recounting across adjacent frames

Why:

  • Baseline counted detections per frame, not per tracked identity.

Fixes implemented:

  • Optional ROI filter: VISION_ROI_NORM.
  • Temporal IoU dedup controls:
    • VISION_DEDUP_ENABLED
    • VISION_DEDUP_IOU_THRESHOLD
    • VISION_DEDUP_TTL_SECONDS
  • ROI box rendered on overlay for visual validation.

Limitation:

  • This reduces recounting but is still heuristic. It is not full identity persistence.

UI and Ops Improvements

  • Side-by-side live preview and annotated overlay.
  • Smaller preview footprint.
  • More useful metrics:
    • all-time total vehicles (selected cameras)
    • vehicles summed over last 15 minutes
  • Removed less useful “last row detected” metric.
  • Added operational hints and runtime tuning controls.
  • Added scripts/preflight_stack.py and improved scripts/run_local_stack.sh startup behavior.

Debug Overlay Screenshot

City Pulse web UI with overlay debug preview

Current Gaps

  1. Counting is still frame-sum based, not “count each vehicle once.”
  2. Temporal IoU dedup is camera-dependent and approximate.
  3. yolov8s improves recall but increases latency.
  4. Streamlit deprecation warnings remain (st.components.v1.html, use_container_width).
  5. Runtime resiliency can improve for transient Redis/HLS interruptions.
  6. ROI editor UX (drag box in UI) is not implemented.

Phase 1: Practical short-term

  • Add ROI preset controls per camera.
  • Add dedup IoU and TTL sliders.
  • Add latency and throughput panel (infer p50/p95, queue age, frames/sec).
  • Add a preset selector: Recall, Balanced, Fast.

Phase 2: Accuracy jump

  • Add simple multi-object tracking (ByteTrack-style IDs).
  • Add optional line-crossing mode to count each object once.

Phase 3: Performance

  • Profile bottlenecks and tune frame size, model size, interval, and batch behavior.
  • Evaluate Apple Silicon acceleration outside current container limits.

Reproducible Runbook

cd "/Users/jacobsmythe/i695 traffic"
source .venv/bin/activate
rtk docker compose up -d redis timescaledb yolo
rtk bash scripts/run_local_stack.sh

Open http://localhost:8501

If needed:

tail -f logs/ingest.log logs/vision.log

Baseline Env That Worked Well

  • VISION_MIN_CONFIDENCE=0.15
  • INGEST_SAMPLE_INTERVAL_SECONDS=2.0
  • INGEST_JPEG_QUALITY=92
  • VISION_DEBUG_OVERLAY_ENABLED=1
  • VISION_ROI_NORM=0.40,0.20,1.00,1.00
  • VISION_DEDUP_ENABLED=1
  • VISION_DEDUP_IOU_THRESHOLD=0.35 to 0.60 (camera dependent)
  • VISION_DEDUP_TTL_SECONDS=2.0 to 5.0

Outcome

The stack is now substantially more usable than the original baseline:

  • one-command bring-up with preflight
  • reliable YOLO startup
  • live visual debugging overlay
  • runtime ingest tuning
  • ROI + temporal dedup controls
  • clearer dashboard metrics and behavior

The next major quality jump is replacing heuristic dedup with true tracking and optional line-crossing counting.