Autonomous · Vision · Navigation · AI-PoweredAI-Powered Rover

MIRA

Mars Intelligent Rover Autonomy

A real-time autonomous vision and navigation pipeline fusing custom Mars-trained YOLO, monocular depth estimation, five time-of-flight sensors, and IMU data — running on a Raspberry Pi 5 with a Hailo-8L NPU accelerating inference at up to 20 FPS.

20fps
Hailo NPU
30Hz
Sensor Loop
10×
NPU Speedup
5
Terrain Classes
University of Sharjah
College of Computing and Informatics · Computer Engineering
Senior Design Project 2
Spring 2025 / 2026 · Supervisor: Dr. Sohaib Majzoub
Sharjah Academy for Astronomy
Space Science & Technology · Scientific Collaboration
01 — Overview

From pixels to motor commands — in real time

MIRA implements a four-layer architecture where perception, processing, control, and actuation are cleanly separated. Visual AI inference is decoupled from sensor-driven decisions, so emergency stops are never blocked by slow inference cycles.

PERCEPTIONInnomaker IMX4151280×720 · 30 FPS · USB 3.0VL53L1X × 5 TOFC · LF · RF · LS · RS zonesMPU-6050IMULDRAmbientHC-SR04 Ultrasonic × 2Front + Rear bumperPROCESSING — RASPBERRY PI 5Hailo-8L NPU · AI HAT+13 TOPS · M.2 InterfaceYOLO26n-segMars terrain416×416 HEF8–15 msDepthAnythingV2 ViT-S224×224 HEF~31 msSensor Fusion LayerTOF · depth · IMU · LDR · detections → FusionResultDecision Layer10-priority waterfall → NavigationCommandCONTROLArduino Mega 2560115200 baud · 30Hz sensor TXSafety LogicBumper → instant PWM cutWatchdog500 ms timeout → STOPPWM GeneratorF / S / L / R / X→ L298N driversACTUATIONL298N × 2Dual H-Bridge motor driversDC Gear Motors × 4Differential drive · encodersPower System7.4V Li-Po → 5V DC-DCAluminum Chassis37.5×24×10.5 cm · ~2.4 kgencoder feedback

Vision

  • YOLO26n-seg Mars terrain detection
  • DepthAnything V2 ViT-S depth map
  • COCO unknown object tagging

Sensing

  • 5× VL53L1X TOF (±2 mm, 1.5 m)
  • MPU-6050 IMU — pitch / roll / yaw
  • HC-SR04 ultrasonic × 2
  • LDR ambient light sensor

Decision

  • 10-priority navigation waterfall
  • Motion velocity tracking
  • LDR-weighted depth fusion

Control

  • Arduino Mega 2560
  • 30 Hz sensor + command loop
  • 500 ms watchdog failsafe
02 — Hardware

Purpose-built sensor array and compute stack

Every component chosen for Mars-analogue autonomy — from NPU-accelerated vision to hardware watchdog failsafes. Total budget: under 4,540 AED (~$1,235).

Compute

Raspberry Pi 5
ARM Cortex-A76 quad-core 2.4 GHz. Runs full Python vision pipeline, AI inference, sensor fusion, and navigation decisions.
Hailo-8L M.2 NPU (AI HAT+)
13 TOPS dedicated neural processing unit. Accelerates YOLO + Depth inference from 1–3 FPS (CPU) to 13–20 FPS.
Arduino Mega 2560
ATmega2560 at 16 MHz. Reads all sensors at 30 Hz, transmits CSV upstream, executes motor PWM commands.

Sensing

VL53L1X × 5 (TOF Laser)
±2 mm accuracy out to 1.5 m. Covers center-front, left-front, right-front, left-side, right-side zones. I²C with XSHUT address assignment.
MPU-6050 IMU
6-axis accelerometer and gyroscope. Provides pitch, roll, yaw for tilt detection and slope avoidance logic.
HC-SR04 × 2
Ultrasonic front + rear bumper fallback. Triggers Arduino-level emergency stop independently of the Pi pipeline.
LDR photoresistor
Normalized 0–1 on A0. Dynamically shifts depth model trust vs. physical TOF weighting in darkness.

Vision, Drive & Power

Innomaker IMX415 (USB 3.0)
Sony 4K CMOS sensor. 1280×720 @ 30 FPS for inference. Low-light down to 0.4 Lux with built-in ISP.
DC Gear Motors × 4 + L298N × 2
High-torque differential drive. Dual H-bridge drivers with encoder feedback. Software speed-limited to protect drivetrain.
7.4V 2200mAh Li-Po + DC-DC Buck
Regulated 5V rail. Inline 10A fuse. Bidirectional 5V↔3.3V logic level shifter between Arduino and Pi GPIO.
Arduino Mega 2560 — Pin Map
SignalPin
TOF XSHUT 0–4D38–D42
TOF I²CSDA / SCL
Ultrasonic Front TRIG / ECHOD24 / D25
Ultrasonic Rear TRIG / ECHOD29 / D28
IMU (MPU-6050)SDA / SCL
LDRA0
Motor ENA / ENB (PWM)D3 / D9
Motor IN1–IN4D5 / D6 / D7 / D8
03 — AI Models

The vision brain behind MIRA

No single model provides sufficient environmental context for reliable autonomous navigation. MIRA fuses two complementary AI models with physical sensor measurements — terrain classification to know what is present, depth estimation to know how far, and TOF to validate both.

Object Detection & Segmentation
YOLO26n-seg
You Only Look Once · Nano · Instance Segmentation
A lightweight nano-sized instance segmentation model fine-tuned on the AI4Mars dataset containing 326,000 labels across 35,000 images from Curiosity, Opportunity, Spirit, and Perseverance rovers. Classifies Martian terrain into four classes — soil, bedrock, sand, big_rock — with confidence threshold 0.55 (above default 0.40) to suppress false positives on featureless Martian background.
Architecture
YOLO nano single-pass
Training
AI4Mars (19,757 pairs)
Input (HEF)
416×416 NHWC
Input (ONNX)
640×640
NPU latency
8–15 ms
CPU latency
80–150 ms
Conf. threshold
0.55
HEF exec contexts
5
Monocular Depth Estimation
DepthAnything V2
ViT-S · DINOv2 Vision Transformer · Zero-Shot Depth
Monocular depth estimation model based on a DINOv2 Vision Transformer encoder, trained on over 62 million diverse real-world images. Produces a dense per-pixel depth map from a single camera frame. Selected over MiDaS and SCDepthV3 for strong zero-shot generalization to unstructured outdoor terrains. The Hailo Model Zoo provides a pre-compiled HEF splitting the transformer across 15 execution contexts on the Hailo-8L — achieving 32.4 FPS vs. 0.5 FPS on CPU. Exponential smoothing α=0.7 suppresses inter-frame flicker.
Backbone
DINOv2 ViT-S
Training data
62M+ diverse images
Input
224×224 NHWC uint8
Output
224×224 UINT16 disparity
NPU latency
~31 ms (32.4 FPS)
CPU latency
200–350 ms (0.5 FPS)
AbsRel (float32)
0.15
AbsRel (INT8 NPU)
0.16 (<7% drop)
YOLO26n-seg — ONNX → Hailo HEF Compilation Pipeline

Fine-tuned YOLOv6n-seg on AI4Mars dataset using Ultralytics on an ml.g5.4xlarge GPU instance. AI4Mars provides 326,000 semantic labels across 35,000 images captured by Curiosity, Opportunity, Spirit, and Perseverance — covering four terrain classes: soil, bedrock, sand, and big_rock.

Step 1 / 5

Model selection rationale

✓ DepthAnything V2 over MiDaS / SCDepthV3
Trained on 62M+ diverse real-world images — strong zero-shot outdoor generalization
Hailo Model Zoo provides pre-compiled HEF — no manual compilation required
SCDepthV3 runs at 145 FPS on Hailo but trained only on NYUv2 indoor (poor outdoor domain gap)
AbsRel 0.15 (float32) → 0.16 (INT8 NPU) — minimal quantization degradation
Transformer splits across 15 NPU execution contexts, eliminating CPU fallback
✓ YOLO26n-seg over YOLOv8n / standalone segmentation
Equivalent accuracy to YOLOv8n with lower parameter count → smaller HEF memory footprint
Single-pass inference — no separate proposal stage, faster latency on embedded hardware
Fine-tuned on AI4Mars — 19,757 validated image-label pairs from actual Mars rover imagery
Standalone segmentation gives no distance info — cannot determine collision range alone
Monocular depth alone cannot classify terrain type — fusion is required for full context

CPU vs. NPU inference comparison

MetricCPU-Only (ONNX)Hailo-8L NPU (HEF)Speedup
YOLO26n-seg latency80–150 ms8–15 ms10–15×
DepthAnything V2 latency200–350 ms~31 ms~10×
Combined slow loop300–500 ms50–75 ms6–8×
Effective visual FPS1–3 FPS13–20 FPS~8×
DepthAnything AbsRel0.15 float320.16 INT8<7% drop
04 — Pipeline

10-priority navigation waterfall

Every decision tick evaluates conditions in strict priority order — highest danger first, all-clear last. The first matching condition wins.

#ConditionAction
1Ultrasonic bumper contact (TOF failed)STOP
2Dangerous tilt — pitch > 20° or roll > 25°STOP
3Side TOF threat — obstacle < 0.8 mTURN AWAY
4Front obstacle < 0.5 m or close rock/unknownSTOP
5Object approaching — velocity < −0.5 m/sTURN AWAY
6Slope detected + TOF confirmationSTOP / SLOW
7Moderate tilt — pitch > 10° or roll > 15°SLOW
8Low ambient light — LDR < 0.25SLOW
9Medium range obstacle < 1.5 mSLOW
10All clearFORWARD

Depth Fusion

DepthAnything V2 output is validated against the front-facing TOF sensor every frame. When they disagree, the correction strategy depends on the magnitude of disagreement.

DisagreementCorrectionConf.
≤ 0.3 mBlend using LDR weights0.95
0.3 – 1.0 mScale zone to TOF value0.60
> 1.0 mStamp zone flat at TOF value0.30

LDR Trust Weighting

Ambient light reading dynamically adjusts how much to trust the depth camera vs. physical TOF.

Bright
≥ 0.6
Depth
70%
TOF
30%
Dim
0.25–0.6
Depth
30%
TOF
70%
Dark
< 0.25
Depth
0%
TOF
100%
05 — Architecture

Three concurrent threads. One coherent system.

Fast Loop
~30 Hz
daemon thread
Arduino.update()
decide()
NavigationCommand

Sensor polling and navigation decisions run at 30 Hz, decoupled from the slower vision loop.

Vision Loop
8–20 FPS (Hailo) · 1–2 FPS (CPU)
daemon thread
Camera frame
YOLO inference
Depth inference
COCO tagging
fuse()
display + record

AI inference runs as fast as the NPU allows — auto-falls back to ONNX CPU when Hailo is unavailable.

Main Loop
50 ms tick
main thread
AUTO: send NavigationCommand
MANUAL: keyboard → command
Arduino serial write

Coordinates output to Arduino. In manual mode, keyboard input overrides autonomous commands.

Serial Protocol

Arduino → Pi: 13 comma-separated values at 115200 baud, 30 Hz.

tof0, tof1, tof2, tof3, tof4, pitch, roll, yaw, ldr, ultra_front, ultra_rear, speed, tof_ok
0–4tof0–tof45× TOF distances (m)
5–7pitch / roll / yawIMU Euler angles (°)
8ldrAmbient light, 0.0 – 1.0
9ultra_frontFront ultrasonic (m)
10ultra_rearRear ultrasonic (m)
11speedCurrent motor speed
12tof_okTOF health flag (0/1)
Commands (Pi → Arduino)
FFORWARD
SSLOW
LTURN LEFT
RTURN RIGHT
XSTOP
500 ms watchdog — Arduino forces STOP if no command received.

AI Models

Hailo HEF for NPU-accelerated inference, ONNX for CPU fallback. Backend auto-detected at boot.

ModelBackendInputPurpose
yolo26n_mars.hefHailo NPU416×416Mars terrain detection
yolo26n_mars.onnxCPU640×640Mars YOLO — CPU fallback
depth_anything_v2_vits.hefHailo NPU224×224Monocular depth map
depth_anything_v2_small.onnxCPU224×224Depth — CPU fallback
yolo26n-seg.onnxCPU640×640COCO unknown tagging
06 — Performance

Inference benchmarks & configuration

★ Optimal
13–20
FPS
Both on Hailo NPU
YOLOHEF
DepthHEF
5–8
FPS
Split — one on Hailo, one on CPU
YOLOHEF/ONNX
DepthONNX/HEF
1–2
FPS
Both on CPU ONNX
YOLOONNX
DepthONNX

Config Flags

All flags in pipeline/config.py. Nothing else needs changing for a typical run.

FlagValuesEffect
ARDUINO_ENABLED0 / 10 = dummy sensor readings (all-clear)
RECORD_ENABLED0 / 11 = write annotated AVI to logs/
COCO_ENABLED0 / 11 = tag unknown objects via COCO
MOTORS_ENABLED0 / 10 = keyboard, 1 = autonomous pipeline
WINDOWS_MODE0 / 11 = disable tty/termios, enable VIDEO_SOURCE
VIDEO_SOURCE0 / path0 = live camera, path = video file
HAILO_AVAILABLEautoAuto-detected at boot

Manual Drive Controls

When MOTORS_ENABLED = 0 (default), keyboard drives the rover.

W / ↑FORWARD
A / ←TURN LEFT
D / →TURN RIGHT
QSLOW
S / ↓ / SpaceSTOP
Ctrl+CQuit
Recorded Video Layout
┌─────────────────────────────┬──────────┐
│  Camera + bbox + velocity   │ Depth    │
│  + mode badge  1280 × 720   │ 320×180  │
├─────────────────────────────┴──────────┤
│  TOF C/LF/RF/LS/RS    pitch/roll/yaw  │
│  LDR / speed          ► NAV CMD       │
│               1280 × 80               │
└───────────────────────────────────────┘
Output: 1280 × 800  XVID AVI → logs/
Research & Links
Coming soon

Personal website, research papers, and future work links will appear here.

Open Source

Explore the full codebase

Arduino firmware, Python pipeline, ONNX model configs, hardware test sketches, and a Flask streaming dashboard — all in one repo.