fix: FP8 fallback for AIU addons running on CPU#200
fix: FP8 fallback for AIU addons running on CPU#200chichun-charlie-liu merged 10 commits intomainfrom
Conversation
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
|
@ani300 need your eyes on this |
ani300
left a comment
There was a problem hiding this comment.
lgtm! the fix makes sense
|
is it worth adding a test to check if the combination that was failing before works now and in the future? |
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
|
@ani300 I added some tests to verify the FP8 CPU support. I also fixed a bug to FP8Linear where the non-quantized activation path could lead to mismatched dtype in the matmul. As you are aware, there have been changes to FP8 in torchao > 0.11. The handling of scales for FP8 tensors seem to be different. This will break the new fallback path of FP8 on CPU, with |
|
@ani300 unrelated to this PR, I noticed a suspicious assignment in I suspect we should be using "weights" instead of "input_activations" to load the weight_strategy. Do you recall if there was any specific reason for this choice? |
|
@andrea-fasoli it's been a while and I don't remember why I picked this particular field, but it probably has to do with how the FP8 checkpoint comes out of llm-compressor |
|
Thanks @andrea-fasoli , for this PR! We can test the fixes out and verify this [issue].(https://github.ibm.com/ai-foundation/aiu-app-sw-tracker/issues/1732) |
|
lgtm, just a suggestion, maybe we can add a reminder "comment" in pyproject.toml on the line of |
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
|
=========================== short test summary info ============================ tests failed due to huggingface does not allow github to access bert (unclear if it's too many access within a certain amount of time or overall, may need to disable this test in the future.) Merge for now. |
Description of the change
Starting from PyTorch 2.10,
torch._scaled_mmno longer supports FP8 matmul on CPU for any quantization scheme other than per-tensor.torch._scaled_mmthrough a call toaddmm_float8_unwrapped_inferenceis currently called by the FP8 AIU addons when the model runs on CPU.This PR implements a fallback in this scenario: we perform a mock FP8 x FP8 matmul on CPU using
torch.nn.functional.linearbetween quantized/dequantized activations and dequantized weights. Notice we do not simply dequantize the FP8 weights, we also mock the activations as FP8.Related issues or PRs
[internal issue]
How to verify the PR
Example of a test that should pass, ran on a pod with 4 AIUs, in PF mode, in PyTorch 2.10 env (set up env vars according to your case; AFTU = aiu-fms-testing-utils repo):
Was the PR tested
Checklist for passing CI/CD:
git commit -signoffor equivalentpre-committox -e unit