Autonomous · Vision · Navigation · AI-PoweredAI-Powered Rover

MIRA

Mars Intelligent Rover Autonomy

A real-time autonomous vision and navigation pipeline fusing custom Mars-trained YOLO, monocular depth estimation, five time-of-flight sensors, and IMU data — running on a Raspberry Pi 5 with a Hailo-8L NPU accelerating inference at up to 20 FPS.

View on GitHub Explore System

20fps

Hailo NPU

30Hz

Sensor Loop

10×

NPU Speedup

Terrain Classes

UOS

University of Sharjah

College of Computing and Informatics · Computer Engineering

SDP2

Senior Design Project 2

Spring 2025 / 2026 · Supervisor: Dr. Sohaib Majzoub

SAASST

Sharjah Academy for Astronomy

Space Science & Technology · Scientific Collaboration

UOS

University of Sharjah

College of Computing and Informatics · Computer Engineering

SDP2

Senior Design Project 2

Spring 2025 / 2026 · Supervisor: Dr. Sohaib Majzoub

SAASST

Sharjah Academy for Astronomy

Space Science & Technology · Scientific Collaboration

01 — Overview

From pixels to motor commands — in real time

MIRA implements a four-layer architecture where perception, processing, control, and actuation are cleanly separated. Visual AI inference is decoupled from sensor-driven decisions, so emergency stops are never blocked by slow inference cycles.

◈

Vision

YOLO26n-seg Mars terrain detection
DepthAnything V2 ViT-S depth map
COCO unknown object tagging

◉

Sensing

5× VL53L1X TOF (±2 mm, 1.5 m)
MPU-6050 IMU — pitch / roll / yaw
HC-SR04 ultrasonic × 2
LDR ambient light sensor

◎

Decision

10-priority navigation waterfall
Motion velocity tracking
LDR-weighted depth fusion

◐

Control

Arduino Mega 2560
30 Hz sensor + command loop
500 ms watchdog failsafe

02 — Hardware

Purpose-built sensor array and compute stack

Every component chosen for Mars-analogue autonomy — from NPU-accelerated vision to hardware watchdog failsafes. Total budget: under 4,540 AED (~$1,235).

⬡

Compute

Raspberry Pi 5

ARM Cortex-A76 quad-core 2.4 GHz. Runs full Python vision pipeline, AI inference, sensor fusion, and navigation decisions.

Hailo-8L M.2 NPU (AI HAT+)

13 TOPS dedicated neural processing unit. Accelerates YOLO + Depth inference from 1–3 FPS (CPU) to 13–20 FPS.

Arduino Mega 2560

ATmega2560 at 16 MHz. Reads all sensors at 30 Hz, transmits CSV upstream, executes motor PWM commands.

◈

Sensing

VL53L1X × 5 (TOF Laser)

±2 mm accuracy out to 1.5 m. Covers center-front, left-front, right-front, left-side, right-side zones. I²C with XSHUT address assignment.

MPU-6050 IMU

6-axis accelerometer and gyroscope. Provides pitch, roll, yaw for tilt detection and slope avoidance logic.

HC-SR04 × 2

Ultrasonic front + rear bumper fallback. Triggers Arduino-level emergency stop independently of the Pi pipeline.

LDR photoresistor

Normalized 0–1 on A0. Dynamically shifts depth model trust vs. physical TOF weighting in darkness.

◎

Vision, Drive & Power

Innomaker IMX415 (USB 3.0)

Sony 4K CMOS sensor. 1280×720 @ 30 FPS for inference. Low-light down to 0.4 Lux with built-in ISP.

DC Gear Motors × 4 + L298N × 2

High-torque differential drive. Dual H-bridge drivers with encoder feedback. Software speed-limited to protect drivetrain.

7.4V 2200mAh Li-Po + DC-DC Buck

Regulated 5V rail. Inline 10A fuse. Bidirectional 5V↔3.3V logic level shifter between Arduino and Pi GPIO.

Arduino Mega 2560 — Pin Map

Signal	Pin
TOF XSHUT 0–4	D38–D42
TOF I²C	SDA / SCL
Ultrasonic Front TRIG / ECHO	D24 / D25
Ultrasonic Rear TRIG / ECHO	D29 / D28
IMU (MPU-6050)	SDA / SCL
LDR	A0
Motor ENA / ENB (PWM)	D3 / D9
Motor IN1–IN4	D5 / D6 / D7 / D8

03 — AI Models

The vision brain behind MIRA

No single model provides sufficient environmental context for reliable autonomous navigation. MIRA fuses two complementary AI models with physical sensor measurements — terrain classification to know what is present, depth estimation to know how far, and TOF to validate both.

Object Detection & Segmentation

YOLO26n-seg

You Only Look Once · Nano · Instance Segmentation

A lightweight nano-sized instance segmentation model fine-tuned on the AI4Mars dataset containing 326,000 labels across 35,000 images from Curiosity, Opportunity, Spirit, and Perseverance rovers. Classifies Martian terrain into four classes — soil, bedrock, sand, big_rock — with confidence threshold 0.55 (above default 0.40) to suppress false positives on featureless Martian background.

Architecture

YOLO nano single-pass

Training

AI4Mars (19,757 pairs)

Input (HEF)

416×416 NHWC

Input (ONNX)

640×640

NPU latency

8–15 ms

CPU latency

80–150 ms

Conf. threshold

0.55

HEF exec contexts

Monocular Depth Estimation

DepthAnything V2

ViT-S · DINOv2 Vision Transformer · Zero-Shot Depth

Monocular depth estimation model based on a DINOv2 Vision Transformer encoder, trained on over 62 million diverse real-world images. Produces a dense per-pixel depth map from a single camera frame. Selected over MiDaS and SCDepthV3 for strong zero-shot generalization to unstructured outdoor terrains. The Hailo Model Zoo provides a pre-compiled HEF splitting the transformer across 15 execution contexts on the Hailo-8L — achieving 32.4 FPS vs. 0.5 FPS on CPU. Exponential smoothing α=0.7 suppresses inter-frame flicker.

Backbone

DINOv2 ViT-S

Training data

62M+ diverse images

Input

224×224 NHWC uint8

Output

224×224 UINT16 disparity

NPU latency

~31 ms (32.4 FPS)

CPU latency

200–350 ms (0.5 FPS)

AbsRel (float32)

0.15

AbsRel (INT8 NPU)

0.16 (<7% drop)

YOLO26n-seg — ONNX → Hailo HEF Compilation Pipeline

›

↓

›

↓

›

↓

›

↓

Fine-tuned YOLOv6n-seg on AI4Mars dataset using Ultralytics on an ml.g5.4xlarge GPU instance. AI4Mars provides 326,000 semantic labels across 35,000 images captured by Curiosity, Opportunity, Spirit, and Perseverance — covering four terrain classes: soil, bedrock, sand, and big_rock.

Step 1 / 5 · AWS Training

Model selection rationale

✓ DepthAnything V2 over MiDaS / SCDepthV3

Trained on 62M+ diverse real-world images — strong zero-shot outdoor generalization

Hailo Model Zoo provides pre-compiled HEF — no manual compilation required

SCDepthV3 runs at 145 FPS on Hailo but trained only on NYUv2 indoor (poor outdoor domain gap)

AbsRel 0.15 (float32) → 0.16 (INT8 NPU) — minimal quantization degradation

Transformer splits across 15 NPU execution contexts, eliminating CPU fallback

✓ YOLO26n-seg over YOLOv8n / standalone segmentation

Equivalent accuracy to YOLOv8n with lower parameter count → smaller HEF memory footprint

Single-pass inference — no separate proposal stage, faster latency on embedded hardware

Fine-tuned on AI4Mars — 19,757 validated image-label pairs from actual Mars rover imagery

Standalone segmentation gives no distance info — cannot determine collision range alone

Monocular depth alone cannot classify terrain type — fusion is required for full context

CPU vs. NPU inference comparison

Metric	CPU-Only (ONNX)	Hailo-8L NPU (HEF)	Speedup
YOLO26n-seg latency	80–150 ms	8–15 ms	10–15×
DepthAnything V2 latency	200–350 ms	~31 ms	~10×
Combined slow loop	300–500 ms	50–75 ms	6–8×
Effective visual FPS	1–3 FPS	13–20 FPS	~8×
DepthAnything AbsRel	0.15 float32	0.16 INT8	<7% drop

04 — Pipeline

10-priority navigation waterfall

Every decision tick evaluates conditions in strict priority order — highest danger first, all-clear last. The first matching condition wins.

#	Condition	Action
1	Ultrasonic bumper contact (TOF failed)	STOP
2	Dangerous tilt — pitch > 20° or roll > 25°	STOP
3	Side TOF threat — obstacle < 0.8 m	TURN AWAY
4	Front obstacle < 0.5 m or close rock/unknown	STOP
5	Object approaching — velocity < −0.5 m/s	TURN AWAY
6	Slope detected + TOF confirmation	STOP / SLOW
7	Moderate tilt — pitch > 10° or roll > 15°	SLOW
8	Low ambient light — LDR < 0.25	SLOW
9	Medium range obstacle < 1.5 m	SLOW
10	All clear	FORWARD

Depth Fusion

DepthAnything V2 output is validated against the front-facing TOF sensor every frame. When they disagree, the correction strategy depends on the magnitude of disagreement.

Disagreement	Correction	Conf.
≤ 0.3 m	Blend using LDR weights	0.95
0.3 – 1.0 m	Scale zone to TOF value	0.60
> 1.0 m	Stamp zone flat at TOF value	0.30

LDR Trust Weighting

Ambient light reading dynamically adjusts how much to trust the depth camera vs. physical TOF.

Bright

≥ 0.6

Depth

70%

TOF

30%

Dim

0.25–0.6

Depth

30%

TOF

70%

Dark

< 0.25

Depth

TOF

100%

05 — Architecture

Three concurrent threads. One coherent system.

Fast Loop

~30 Hz

daemon thread

Arduino.update()

decide()

NavigationCommand

Sensor polling and navigation decisions run at 30 Hz, decoupled from the slower vision loop.

Vision Loop

8–20 FPS (Hailo) · 1–2 FPS (CPU)

daemon thread

Camera frame

YOLO inference

Depth inference

COCO tagging

fuse()

display + record

AI inference runs as fast as the NPU allows — auto-falls back to ONNX CPU when Hailo is unavailable.

Main Loop

50 ms tick

main thread

AUTO: send NavigationCommand

MANUAL: keyboard → command

Arduino serial write

Coordinates output to Arduino. In manual mode, keyboard input overrides autonomous commands.

Serial Protocol

Arduino → Pi: 13 comma-separated values at 115200 baud, 30 Hz.

tof0, tof1, tof2, tof3, tof4, pitch, roll, yaw, ldr, ultra_front, ultra_rear, speed, tof_ok

0–4tof0–tof45× TOF distances (m)

5–7pitch / roll / yawIMU Euler angles (°)

8ldrAmbient light, 0.0 – 1.0

9ultra_frontFront ultrasonic (m)

10ultra_rearRear ultrasonic (m)

11speedCurrent motor speed

12tof_okTOF health flag (0/1)

Commands (Pi → Arduino)

FFORWARD

SSLOW

LTURN LEFT

RTURN RIGHT

XSTOP

500 ms watchdog — Arduino forces STOP if no command received.

AI Models

Hailo HEF for NPU-accelerated inference, ONNX for CPU fallback. Backend auto-detected at boot.

Model	Backend	Input	Purpose
yolo26n_mars.hef	Hailo NPU	416×416	Mars terrain detection
yolo26n_mars.onnx	CPU	640×640	Mars YOLO — CPU fallback
depth_anything_v2_vits.hef	Hailo NPU	224×224	Monocular depth map
depth_anything_v2_small.onnx	CPU	224×224	Depth — CPU fallback
yolo26n-seg.onnx	CPU	640×640	COCO unknown tagging

06 — Performance

Inference benchmarks & configuration

★ Optimal

13–20

FPS

Both on Hailo NPU

YOLOHEF

DepthHEF

5–8

FPS

Split — one on Hailo, one on CPU

YOLOHEF/ONNX

DepthONNX/HEF

1–2

FPS

Both on CPU ONNX

YOLOONNX

DepthONNX

Config Flags

All flags in pipeline/config.py. Nothing else needs changing for a typical run.

Flag	Values	Effect
ARDUINO_ENABLED	0 / 1	0 = dummy sensor readings (all-clear)
RECORD_ENABLED	0 / 1	1 = write annotated AVI to logs/
COCO_ENABLED	0 / 1	1 = tag unknown objects via COCO
MOTORS_ENABLED	0 / 1	0 = keyboard, 1 = autonomous pipeline
WINDOWS_MODE	0 / 1	1 = disable tty/termios, enable VIDEO_SOURCE
VIDEO_SOURCE	0 / path	0 = live camera, path = video file
HAILO_AVAILABLE	auto	Auto-detected at boot

Manual Drive Controls

When MOTORS_ENABLED = 0 (default), keyboard drives the rover.

W / ↑FORWARD

A / ←TURN LEFT

D / →TURN RIGHT

QSLOW

S / ↓ / SpaceSTOP

Ctrl+CQuit

Recorded Video Layout

┌─────────────────────────────┬──────────┐
│  Camera + bbox + velocity   │ Depth    │
│  + mode badge  1280 × 720   │ 320×180  │
├─────────────────────────────┴──────────┤
│  TOF C/LF/RF/LS/RS    pitch/roll/yaw  │
│  LDR / speed          ► NAV CMD       │
│               1280 × 80               │
└───────────────────────────────────────┘
Output: 1280 × 800  XVID AVI → logs/

MIRA

From pixels to motor commands — in real time

Vision

Sensing

Decision

Control

Purpose-built sensor array and compute stack

Compute

Sensing

Vision, Drive & Power

The vision brain behind MIRA

Model selection rationale

CPU vs. NPU inference comparison

10-priority navigation waterfall

Depth Fusion

LDR Trust Weighting

Three concurrent threads. One coherent system.

Serial Protocol

AI Models

Inference benchmarks & configuration

Config Flags

Manual Drive Controls

Explore the full codebase