Bare Metal vs RTOS

Bare metal means running code directly on hardware with no operating system — just main(), a super loop, and interrupt handlers. An RTOS adds a scheduler, tasks, and synchronization primitives. The choice shapes every aspect of firmware architecture.

Why It Matters

Picking the wrong approach costs time. Bare metal on a complex multi-sensor system leads to a tangled state machine. An RTOS on a simple LED blinker adds RAM overhead and debug complexity for no benefit. Understanding the trade-offs lets you choose correctly and know when to switch.

How It Works

The Super Loop (Bare Metal)

The simplest embedded architecture: initialize hardware, then loop forever.

int main(void) {
    clock_init();
    gpio_init();
    uart_init();
    adc_init();
 
    while (1) {
        read_sensors();       // ~2 ms
        process_data();       // ~5 ms
        update_display();     // ~10 ms
        check_buttons();      // ~1 ms
        // total loop time: ~18 ms -> everything runs at ~55 Hz
    }
}

Problem: every function must wait for the previous one to finish. If update_display() takes 50 ms (e.g., SPI to an LCD), check_buttons() only runs at 20 Hz. A button press could be missed.

Improving Bare Metal: ISR + Flags

Timer and peripheral interrupts break the loop dependency:

volatile uint8_t sensor_flag = 0;
volatile uint8_t button_flag = 0;
volatile uint32_t ms_ticks = 0;
 
void SysTick_Handler(void) {
    ms_ticks++;
    if (ms_ticks % 100 == 0) sensor_flag = 1;  // every 100ms
}
 
void EXTI0_IRQHandler(void) {
    EXTI->PR = (1 << 0);
    button_flag = 1;
}
 
int main(void) {
    init_all();
    while (1) {
        if (sensor_flag) {
            sensor_flag = 0;
            read_sensors();
            process_data();
        }
        if (button_flag) {
            button_flag = 0;
            handle_button();
        }
        update_display();    // runs every loop iteration
        __WFI();             // sleep until next interrupt
    }
}

This “super loop + ISR flags” pattern handles moderate complexity well. It is deterministic, uses zero RAM overhead, and is easy to reason about.

When RTOS Adds Value

The RTOS version of the same system:

void sensor_task(void *p) {
    while (1) {
        read_sensors();
        process_data();
        xQueueSend(display_queue, &result, portMAX_DELAY);
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}
 
void button_task(void *p) {
    while (1) {
        xSemaphoreTake(button_sem, portMAX_DELAY);  // blocks until ISR
        handle_button();
    }
}
 
void display_task(void *p) {
    Result r;
    while (1) {
        xQueueReceive(display_queue, &r, portMAX_DELAY);
        update_display(&r);
    }
}

Each concern is isolated. A slow display update does not block sensor reading. Priorities ensure the most important task runs first. Blocking calls (queue receive, semaphore take) are clean — no polling, no flags.

Decision Matrix

FactorBare MetalRTOS
ComplexitySimple (1-3 tasks)Multiple concurrent activities (4+)
TimingDeterministic (you control the loop)Deterministic within priority rules
LatencyISR response: immediateTask response: within 1 tick (~1ms) + ISR
Blocking IOCannot block in main loopTasks block independently
RAM overheadZero (just your stack)Kernel ~2-4KB + per-task stack (128-512 words each)
Flash overheadJust your code+6-10KB for kernel
DebugSimple — single thread of executionHarder — race conditions, priority issues
Networking/USBPainful (state machines)Natural (blocking socket/transfer calls)
Safety-criticalEasier to certify (less code)FreeRTOS has SAFERTOS variant (IEC 61508)
PowerFull control over sleep modesTickless idle mode available in FreeRTOS
MCU sizeAny, including tiny 8KB FlashNeeds at least ~20KB Flash, ~8KB RAM

When Bare Metal Wins

  • Simple, single-purpose devices: a thermostat, an LED controller, a motor driver with one PID loop
  • Extremely resource-constrained MCUs: 8-bit AVR, tiny Cortex-M0 with 16KB Flash
  • Hard real-time with sub-microsecond jitter: ISR-driven control where even 1 tick of RTOS jitter matters
  • Safety certification: less code = less to certify
  • Learning: understand the hardware first, add abstraction later

When RTOS Wins

  • Multiple concurrent protocols: UART + SPI + I2C + USB all active
  • Networking stacks: TCP/IP (lwIP), BLE, MQTT need blocking calls and background processing
  • Complex async workflows: wait for sensor AND user input AND network response
  • Multiple developers: tasks provide natural code boundaries
  • Middleware that expects it: many vendor SDKs (STM32 USB, ESP-IDF) assume FreeRTOS

Hybrid Approaches

Real systems often mix both:

  1. Super loop + timer ISRs: the ISR+flag pattern above — 80% of embedded projects live here successfully
  2. RTOS with bare-metal ISRs: critical control loops run in ISRs at hardware priority (above RTOS), while communication and UI run as RTOS tasks
  3. Cooperative scheduler: a minimal round-robin without preemption — lighter than FreeRTOS, more structured than raw super loop
// Minimal cooperative scheduler (no RTOS needed)
typedef struct { void (*fn)(void); uint32_t period; uint32_t next; } Task;
 
Task tasks[] = {
    { read_sensors,    100, 0 },   // every 100ms
    { update_display,  200, 0 },   // every 200ms
    { check_comms,      50, 0 },   // every 50ms
};
 
int main(void) {
    init_all();
    while (1) {
        uint32_t now = HAL_GetTick();
        for (int i = 0; i < 3; i++) {
            if (now >= tasks[i].next) {
                tasks[i].fn();
                tasks[i].next = now + tasks[i].period;
            }
        }
        __WFI();
    }
}

This gives time-based scheduling without any RTOS overhead. It breaks down when tasks take longer than their period or when you need blocking calls.