Skip to content

Batch console output to one VM exit per print call#2263

Open
stlankes wants to merge 1 commit intohermit-os:mainfrom
stlankes:io
Open

Batch console output to one VM exit per print call#2263
stlankes wants to merge 1 commit intohermit-os:mainfrom
stlankes:io

Conversation

@stlankes
Copy link
Contributor

@stlankes stlankes commented Feb 18, 2026

Increase serial buffer from 256 to 2048 bytes and gate newline-triggered flush on debug_assertions. In release builds, data is only flushed when the buffer is full or explicitly at the end of each _print()/_panic_print() call. This reduces VM exits from ~28 per multi-line info!() to just 1.

I cherry-pick a PR from @jbreitbart's repository.

Increase serial buffer from 256 to 2048 bytes and gate newline-triggered
flush on debug_assertions. In release builds, data is only flushed when
the buffer is full or explicitly at the end of each _print()/_panic_print()
call. This reduces VM exits from ~28 per multi-line info!() to just 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 84b9af2 Previous: 04ab3e2 Performance Ratio
startup_benchmark Build Time 102.80 s 98.87 s 1.04
startup_benchmark File Size 0.87 MB 0.86 MB 1.01
Startup Time - 1 core 0.97 s (±0.03 s) 0.95 s (±0.03 s) 1.01
Startup Time - 2 cores 0.99 s (±0.03 s) 0.97 s (±0.03 s) 1.02
Startup Time - 4 cores 0.98 s (±0.03 s) 0.96 s (±0.03 s) 1.02
multithreaded_benchmark Build Time 101.55 s 101.80 s 1.00
multithreaded_benchmark File Size 0.97 MB 0.96 MB 1.01
Multithreaded Pi Efficiency - 2 Threads 91.52 % (±8.68 %) 91.38 % (±7.84 %) 1.00
Multithreaded Pi Efficiency - 4 Threads 44.45 % (±3.82 %) 44.41 % (±3.31 %) 1.00
Multithreaded Pi Efficiency - 8 Threads 26.17 % (±1.71 %) 25.72 % (±1.88 %) 1.02
micro_benchmarks Build Time 110.64 s 112.45 s 0.98
micro_benchmarks File Size 0.97 MB 0.96 MB 1.01
Scheduling time - 1 thread 71.18 ticks (±3.56 ticks) 71.54 ticks (±3.61 ticks) 0.99
Scheduling time - 2 threads 41.34 ticks (±6.11 ticks) 39.74 ticks (±4.96 ticks) 1.04
Micro - Time for syscall (getpid) 3.10 ticks (±0.28 ticks) 3.06 ticks (±0.30 ticks) 1.01
Memcpy speed - (built_in) block size 4096 64380.42 MByte/s (±46026.29 MByte/s) 64053.45 MByte/s (±45682.00 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 29563.33 MByte/s (±24631.69 MByte/s) 29971.18 MByte/s (±24961.55 MByte/s) 0.99
Memcpy speed - (built_in) block size 16777216 21566.55 MByte/s (±18027.43 MByte/s) 25003.45 MByte/s (±21013.82 MByte/s) 0.86
Memset speed - (built_in) block size 4096 65283.58 MByte/s (±46683.59 MByte/s) 64316.32 MByte/s (±45851.10 MByte/s) 1.02
Memset speed - (built_in) block size 1048576 30322.83 MByte/s (±25056.54 MByte/s) 30752.34 MByte/s (±25389.93 MByte/s) 0.99
Memset speed - (built_in) block size 16777216 22282.76 MByte/s (±18542.65 MByte/s) 25753.82 MByte/s (±21487.22 MByte/s) 0.87
Memcpy speed - (rust) block size 4096 56785.63 MByte/s (±41866.87 MByte/s) 59798.08 MByte/s (±44107.11 MByte/s) 0.95
Memcpy speed - (rust) block size 1048576 29478.00 MByte/s (±24564.58 MByte/s) 29689.71 MByte/s (±24867.97 MByte/s) 0.99
Memcpy speed - (rust) block size 16777216 21453.24 MByte/s (±17887.44 MByte/s) 24799.81 MByte/s (±20847.11 MByte/s) 0.87
Memset speed - (rust) block size 4096 57415.51 MByte/s (±42321.51 MByte/s) 60738.74 MByte/s (±44654.61 MByte/s) 0.95
Memset speed - (rust) block size 1048576 30249.83 MByte/s (±24996.10 MByte/s) 30493.17 MByte/s (±25306.10 MByte/s) 0.99
Memset speed - (rust) block size 16777216 22020.93 MByte/s (±18235.81 MByte/s) 25549.21 MByte/s (±21318.77 MByte/s) 0.86
alloc_benchmarks Build Time 107.76 s 103.81 s 1.04
alloc_benchmarks File Size 0.95 MB 0.93 MB 1.01
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 9494.54 Ticks (±701.57 Ticks) 11018.15 Ticks (±189.29 Ticks) 0.86
Allocations - Average Allocation time (no fail) 9494.54 Ticks (±701.57 Ticks) 11018.15 Ticks (±189.29 Ticks) 0.86
Allocations - Average Deallocation time 3688.50 Ticks (±318.31 Ticks) 1054.71 Ticks (±332.39 Ticks) 3.50
mutex_benchmark Build Time 107.96 s 109.97 s 0.98
mutex_benchmark File Size 0.97 MB 0.96 MB 1.01
Mutex Stress Test Average Time per Iteration - 1 Threads 13.72 ns (±0.66 ns) 13.32 ns (±0.68 ns) 1.03
Mutex Stress Test Average Time per Iteration - 2 Threads 16.24 ns (±0.95 ns) 15.94 ns (±0.83 ns) 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Member

@mkroening mkroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have investigated how buffering works in glibc and Rust.

// test.c
#include <stdio.h>
#include <unistd.h>

void wait(void) {
    size_t n = 0x40000000;
    for (size_t i = 0; i < n; ++i) {
        asm("");
    }
}

int main(void) {
    // Does buffer
    printf("printf");
    wait();
    printf(" does buffer\n");
    wait();
    
    // Does buffer
    fprintf(stdout, "fprintf(stdout)");
    wait();
    fprintf(stdout, " does buffer\n");
    wait();

    // Does not buffer
    fprintf(stderr, "fprintf(stderr)");
    wait();
    fprintf(stderr, " does not buffer\n");
    wait();

    // Does not buffer
    write(STDOUT_FILENO, "write(STDOUT_FILENO)", 20);
    wait();
    write(STDOUT_FILENO, " does not buffer\n", 17);
    wait();

    // Does not buffer
    write(STDERR_FILENO, "write(STDERR_FILENO)", 20);
    wait();
    write(STDERR_FILENO, " does not buffer\n", 17);
    wait();
}

Run with gcc test.c && ./a.out.

#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
log = "0.4"
env_logger = "0.11"
---

use std::io::{self, Write};

use log::error;

fn main() {
    // Does buffer
    print!("print!");
    wait();
    println!(" does buffer");
    wait();

    {
        // Does buffer
        let stdout = io::stdout();
        let mut handle = stdout.lock();

        handle.write_all(b"stdout").unwrap();
        wait();
        handle.write_all(b" does buffer\n").unwrap();
        wait();
    }

    // Does not buffer
    eprint!("eprint!");
    wait();
    eprintln!(" does not buffer");
    wait();

    {
        // Does not buffer
        let stderr = io::stderr();
        let mut handle = stderr.lock();

        handle.write_all(b"stderr").unwrap();
        wait();
        handle.write_all(b" does not buffer\n").unwrap();
        wait();
    }

    env_logger::init();

    // Does buffer
    error!(
        "log\n{}does\n{}buffer\n",
        {
            wait();
            '1'
        },
        {
            wait();
            '2'
        }
    );
}

fn wait() {
    let n = 0x4000000_u64;
    for _ in 0..n {
        std::hint::spin_loop();
    }
}

Run with chmod +x test.rs && ./test.rs.

These results seem sensible to me, and I think we should try to reproduce them with Hermit.

Specifically, this means that sys_write calls should never buffer. Currently, sys_write(STDOUT_FILENO) and sys_write(STDERR_FILENO) are buffered because we run them through the same console buffer, which we should change. fprintf(stdout) (printf), stdout (print!), and log buffering should happen in userspace. All stdout and stderr output (especially non-text-based output) via sys_write should never be subject to text-based buffering.

Inside the kernel, we should adopt the userspace logic as necessary.

That means we should remove the buffering from the console abstraction and move them into kernel-internal stdout and stderr abstractions that buffer stdout appropriately and don't buffer stderr. For stdout buffering, we should always flush on newlines. We could use a buffer of 8192 there. Glibc defines stdio.h's BUFSIZ as 8192. Rust defines sys::io's DEFAULT_BUF_SIZE as 8 * 1024. Musl defines stdio.h's BUFSIZ as 1024.

For logging, we should add another buffer in our logging implementation similar to env_logger/src/writer/buffer.rs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants