-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Closed
Description
Hi, I have run SDXL with Tensor Parallel as well as sequence parallel. Below is my PR, and may it help those who need it.
The Motivation:
Just trying to avoid using grad checkpointing to get higher throughput when inputs have higher resolution like 720p.
However, tensor parallel comes at a cost, and I have not gained throughput by TP. (Tested with 720*1080 on A100, batchsize=16 and amp).
Just in case someone have the same idea or try to run tensor prarallel with more blocks, below is my code changes:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels