Tensor scale nvfp4 #3022

nastya236 · 2026-01-20T15:39:17Z

Add per tensor scale for `nvfp4` quantization for `cuda` and `cpu`.

qqmm, quantize, dequantize inputs optional 1D float32 array (global_scale) if mode == "nvfp4".

Also some files related to qqmm were refactored.

Important details:

[qqmm] currently if global_scale is provided for the first input, it must be provided for the second input as well. This is because we pass global scales as inputs in QQMatmul::eval_gpu() and we can't distinguish between global_scale_x and global_scale_w.
alpha and beta both should be device or host ptrs. Therefore, if alpha is a device ptr,
Tensor scale will help with small inputs:

import mlx.core as mx

x = mx.random.uniform(shape=(2, 16)) / 1e5
xq_ns, scales_ns = mx.quantize(x, mode="nvfp4")
global_scale=mx.absmax(x).astype(mx.float32)
xq_s, scales_s = mx.quantize(x, mode="nvfp4", global_scale = global_scale)

print(mx.allclose(scales_ns, mx.zeros_like(scales_ns)))
print(mx.allclose(scales_s, mx.zeros_like(scales_s)))

TODO:

we probably want to support global_scale in metal as well but it requires changing all quantized operations
it is not yet clear what might be the best strategy for scale computation during training (for x, w as well as for cotan), therefore QQLinear does not have global scale support. I will add it after some exploration as a separate PR.
fp_qmv_impl was not updated yet to support global scale

…into tensor-scale-nvfp4

python/src/ops.cpp

…te PR)

python/src/ops.cpp

mlx/ops.cpp

mlx/primitives.cpp

awni · 2026-02-11T01:01:18Z

mlx/backend/cuda/quantized/qqmm_impl.cpp

      qmode);

-  qqmm.run(encoder, out, a, b, a_scale, b_scale, alpha);
+  if (scalars.uses_device_pointers()) {


The name of that method is a little confusing. Maybe it would make more sense to call it has_values() or something?

awni

Sorry I left one last minor comment. Otherwise this looks great, we should merge it!

nastya236 and others added 8 commits January 16, 2026 00:47

adding tensor scale [wip]

98eedd1

Merge branch 'main' into tensor-scale-nvfp4

438830a

added absmax reduction, changed fp_quanitze api [wip]

6892404

refactoring

15d684b

Merge branch 'ml-explore:main' into tensor-scale-nvfp4

8c67953

alpha device ptr for qqmm

a7fab99

Merge branch 'tensor-scale-nvfp4' of https://github.com/nastya236/mlx …

9fdfce6

…into tensor-scale-nvfp4

device alpha, beta

47be994

nastya236 closed this Jan 20, 2026

nastya236 reopened this Jan 20, 2026

nastya236 changed the title ~~Tensor scale nvfp4~~ [WIP] Tensor scale nvfp4 Jan 20, 2026

nastya236 added 2 commits January 20, 2026 20:43

harcoded absmax to output float

7e4c6e8

fixed ops python dequantize

11ff19a

awni reviewed Jan 20, 2026

View reviewed changes

python/src/ops.cpp Outdated Show resolved Hide resolved

nastya236 added 15 commits January 21, 2026 00:52

input global_scale

2a86dc1

fix global_scale

2c68fb6

Merge branch 'main' into tensor-scale-nvfp4

abe37c2

fix scale to be float(fp8e4m3(scale))

277ceeb

removed AbsMax reduction (probably add back in the future as a separa…

dad7e57

…te PR)

Merge branch 'main' into tensor-scale-nvfp4

3d7ebd9

fix columnwise quantize scale, precommit

0a804a9

abs_max

7ca2642

fix

934c0c8

fixed the fallback, fixed absmax

1fea025

fix docs, remove the diff

306acd0

fix docs, delete debuging print

7492841

Merge branch 'main' into tensor-scale-nvfp4

5503802

reverted the example

f49abe5

abs_max -> absmax

37e5789

nastya236 changed the title ~~[WIP] Tensor scale nvfp4~~ Tensor scale nvfp4 Jan 23, 2026

nastya236 added 13 commits January 24, 2026 14:22

fix abs type

20480ef

fix fp type for vjp

9f9aabd

decrease block size because of the register pressure

d2dc310

Merge branch 'main' into tensor-scale-nvfp4

76cd3b4

drop absmax

79d93e6

merge conflict fp-quantize

d91fd8a

add scale to fp_quantiz-dequantize, fix merge conflicts, refactor

5cbf48f

pre-commit + update a comment

da1cacf

revert qq_linear global scale [WIP]

67385c8

refactoring, revert block size

ad1fcf1

Merge remote-tracking branch 'upstream/main' into tensor-scale-nvfp4

019a31d

revert the year change

b1dcd2f

Merge branch 'main' into tensor-scale-nvfp4

e38077d