MIRA
Mars Intelligent Rover Autonomy
A real-time autonomous vision and navigation pipeline fusing custom Mars-trained YOLO, monocular depth estimation, five time-of-flight sensors, and IMU data — running on a Raspberry Pi 5 with a Hailo-8L NPU accelerating inference at up to 20 FPS.
From pixels to motor commands — in real time
MIRA implements a four-layer architecture where perception, processing, control, and actuation are cleanly separated. Visual AI inference is decoupled from sensor-driven decisions, so emergency stops are never blocked by slow inference cycles.
Vision
- YOLO26n-seg Mars terrain detection
- DepthAnything V2 ViT-S depth map
- COCO unknown object tagging
Sensing
- 5× VL53L1X TOF (±2 mm, 1.5 m)
- MPU-6050 IMU — pitch / roll / yaw
- HC-SR04 ultrasonic × 2
- LDR ambient light sensor
Decision
- 10-priority navigation waterfall
- Motion velocity tracking
- LDR-weighted depth fusion
Control
- Arduino Mega 2560
- 30 Hz sensor + command loop
- 500 ms watchdog failsafe
Purpose-built sensor array and compute stack
Every component chosen for Mars-analogue autonomy — from NPU-accelerated vision to hardware watchdog failsafes. Total budget: under 4,540 AED (~$1,235).
Compute
Sensing
Vision, Drive & Power
| Signal | Pin |
|---|---|
| TOF XSHUT 0–4 | D38–D42 |
| TOF I²C | SDA / SCL |
| Ultrasonic Front TRIG / ECHO | D24 / D25 |
| Ultrasonic Rear TRIG / ECHO | D29 / D28 |
| IMU (MPU-6050) | SDA / SCL |
| LDR | A0 |
| Motor ENA / ENB (PWM) | D3 / D9 |
| Motor IN1–IN4 | D5 / D6 / D7 / D8 |
The vision brain behind MIRA
No single model provides sufficient environmental context for reliable autonomous navigation. MIRA fuses two complementary AI models with physical sensor measurements — terrain classification to know what is present, depth estimation to know how far, and TOF to validate both.
Fine-tuned YOLOv6n-seg on AI4Mars dataset using Ultralytics on an ml.g5.4xlarge GPU instance. AI4Mars provides 326,000 semantic labels across 35,000 images captured by Curiosity, Opportunity, Spirit, and Perseverance — covering four terrain classes: soil, bedrock, sand, and big_rock.
Model selection rationale
CPU vs. NPU inference comparison
| Metric | CPU-Only (ONNX) | Hailo-8L NPU (HEF) | Speedup |
|---|---|---|---|
| YOLO26n-seg latency | 80–150 ms | 8–15 ms | 10–15× |
| DepthAnything V2 latency | 200–350 ms | ~31 ms | ~10× |
| Combined slow loop | 300–500 ms | 50–75 ms | 6–8× |
| Effective visual FPS | 1–3 FPS | 13–20 FPS | ~8× |
| DepthAnything AbsRel | 0.15 float32 | 0.16 INT8 | <7% drop |
10-priority navigation waterfall
Every decision tick evaluates conditions in strict priority order — highest danger first, all-clear last. The first matching condition wins.
| # | Condition | Action |
|---|---|---|
| 1 | Ultrasonic bumper contact (TOF failed) | STOP |
| 2 | Dangerous tilt — pitch > 20° or roll > 25° | STOP |
| 3 | Side TOF threat — obstacle < 0.8 m | TURN AWAY |
| 4 | Front obstacle < 0.5 m or close rock/unknown | STOP |
| 5 | Object approaching — velocity < −0.5 m/s | TURN AWAY |
| 6 | Slope detected + TOF confirmation | STOP / SLOW |
| 7 | Moderate tilt — pitch > 10° or roll > 15° | SLOW |
| 8 | Low ambient light — LDR < 0.25 | SLOW |
| 9 | Medium range obstacle < 1.5 m | SLOW |
| 10 | All clear | FORWARD |
Depth Fusion
DepthAnything V2 output is validated against the front-facing TOF sensor every frame. When they disagree, the correction strategy depends on the magnitude of disagreement.
| Disagreement | Correction | Conf. |
|---|---|---|
| ≤ 0.3 m | Blend using LDR weights | 0.95 |
| 0.3 – 1.0 m | Scale zone to TOF value | 0.60 |
| > 1.0 m | Stamp zone flat at TOF value | 0.30 |
LDR Trust Weighting
Ambient light reading dynamically adjusts how much to trust the depth camera vs. physical TOF.
Three concurrent threads. One coherent system.
Sensor polling and navigation decisions run at 30 Hz, decoupled from the slower vision loop.
AI inference runs as fast as the NPU allows — auto-falls back to ONNX CPU when Hailo is unavailable.
Coordinates output to Arduino. In manual mode, keyboard input overrides autonomous commands.
Serial Protocol
Arduino → Pi: 13 comma-separated values at 115200 baud, 30 Hz.
AI Models
Hailo HEF for NPU-accelerated inference, ONNX for CPU fallback. Backend auto-detected at boot.
| Model | Backend | Input | Purpose |
|---|---|---|---|
| yolo26n_mars.hef | Hailo NPU | 416×416 | Mars terrain detection |
| yolo26n_mars.onnx | CPU | 640×640 | Mars YOLO — CPU fallback |
| depth_anything_v2_vits.hef | Hailo NPU | 224×224 | Monocular depth map |
| depth_anything_v2_small.onnx | CPU | 224×224 | Depth — CPU fallback |
| yolo26n-seg.onnx | CPU | 640×640 | COCO unknown tagging |
Inference benchmarks & configuration
Config Flags
All flags in pipeline/config.py. Nothing else needs changing for a typical run.
| Flag | Values | Effect |
|---|---|---|
| ARDUINO_ENABLED | 0 / 1 | 0 = dummy sensor readings (all-clear) |
| RECORD_ENABLED | 0 / 1 | 1 = write annotated AVI to logs/ |
| COCO_ENABLED | 0 / 1 | 1 = tag unknown objects via COCO |
| MOTORS_ENABLED | 0 / 1 | 0 = keyboard, 1 = autonomous pipeline |
| WINDOWS_MODE | 0 / 1 | 1 = disable tty/termios, enable VIDEO_SOURCE |
| VIDEO_SOURCE | 0 / path | 0 = live camera, path = video file |
| HAILO_AVAILABLE | auto | Auto-detected at boot |
Manual Drive Controls
When MOTORS_ENABLED = 0 (default), keyboard drives the rover.
┌─────────────────────────────┬──────────┐ │ Camera + bbox + velocity │ Depth │ │ + mode badge 1280 × 720 │ 320×180 │ ├─────────────────────────────┴──────────┤ │ TOF C/LF/RF/LS/RS pitch/roll/yaw │ │ LDR / speed ► NAV CMD │ │ 1280 × 80 │ └───────────────────────────────────────┘ Output: 1280 × 800 XVID AVI → logs/