Skip to content

Change VerifyingKey::A_hat and SigningKey::A_hat field from stack-all…#1257

Closed
mgodf89 wants to merge 1 commit intoRustCrypto:masterfrom
mgodf89:issue1256_mldsa_from_seed_memory_leak
Closed

Change VerifyingKey::A_hat and SigningKey::A_hat field from stack-all…#1257
mgodf89 wants to merge 1 commit intoRustCrypto:masterfrom
mgodf89:issue1256_mldsa_from_seed_memory_leak

Conversation

@mgodf89
Copy link

@mgodf89 mgodf89 commented Mar 17, 2026

Fixes #1256

ml-dsa: Box A_hat in SigningKey and VerifyingKey to reduce stack usage

Problem

from_seed for ML-DSA-65 consumes ~600KB of stack in debug builds, causing stack overflows on WASM targets and on native targets in common scenarios such as test harnesses, async runtimes, and applications that call from_seed from non-trivial call depths.

SigningKey and VerifyingKey each contain an NttMatrix<K, L> inline — 30KB each for ML-DSA-65. In from_seed, the compiler must hold the local A_hat, a clone for VerifyingKey, and the move into SigningKey simultaneously on top of all other temporaries. In debug mode without NRVO or move elision, the full frame reaches ~600KB.

WASM: On wasm32-wasip2 with the default 1MB stack, a single from_seed call overflows inside expand_aArray::from_fn.

Native: On x86_64 with the default 8MB test thread stack, creating two key pairs (common in handshake/session tests) reliably overflows.

Fix

Box the A_hat field in SigningKey and VerifyingKey, moving ~90KB from stack to heap.

Changes

  • SigningKey::A_hat / VerifyingKey::A_hat: NttMatrix<P::K, P::L>Box<NttMatrix<P::K, P::L>>
  • SigningKey::new() / VerifyingKey::new(): wrap with Box::new()
  • Three &self.A_hat * ... multiplication sites: → &*self.A_hat * ... (explicit deref required; Mul is implemented for &NttMatrix, not &Box<NttMatrix>)
  • SigningKey::verifying_key(): (*self.A_hat).clone()
  • Added extern crate alloc and use alloc::boxed::Box (gated on the existing alloc feature, which is in the default set)

No public API changes — A_hat is a private field. All existing tests pass.

Results

Measured in a debug-mode test scenario with two parties each generating keys via from_seed and performing a full session handshake:

Metric Before After
Minimum viable RUST_MIN_STACK 32 MB 4 MB
from_seed stack consumption ~600 KB ~100 KB

A standalone from_seed call requires well under 200KB after this change.

…ocated to Box (Box the A_hat in `new` methods) & Dereference Box when cloning A_hat for VerifyingKey in verifying_key()
use module_lattice::Truncate;
use sha3::Shake256;
use signature::{DigestSigner, DigestVerifier, MultipartSigner, MultipartVerifier, Signer};
extern crate alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We deliberately avoid having a hard dependency on liballoc and the heap.

This needs to at least be an optional alloc feature, although ideally the solution would not need such a dependency

@tarcieri
Copy link
Member

Closing as this add a hard dependency on liballoc

@tarcieri tarcieri closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

from_seed / KeyGen stack overflow on WASM and constrained-stack environments (ML-DSA-65/87)

3 participants