[PyTorch] Make grouped weights opt-in#2678
Conversation
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
/te-ci pytorch |
Greptile OverviewGreptile SummaryThis PR makes the grouped weights feature opt-in by checking the environment variable Changes:
Note: The PR description mentions the variable name is temporary and will be replaced with a single-parameter option in future work. Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as test_sanity_grouped_linear
participant Env as Environment
participant GL as GroupedLinear
participant GT as GroupedTensor
alt single_param=True
Test->>Env: Set NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS=1
end
Test->>GL: Create GroupedLinear instance
GL->>GL: __init__() registers weight parameters
GL->>GL: reset_parameters()
alt Environment variable set
GL->>Env: Check NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS
Env-->>GL: "1" (enabled)
GL->>GL: make_grouped_weights()
GL->>GT: Create contiguous GroupedTensor storage
GT-->>GL: Return grouped tensors
GL->>GL: Re-register weights as GroupedTensor views
else Environment variable not set
GL->>Env: Check NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS
Env-->>GL: "0" (disabled)
Note over GL: Skip make_grouped_weights(), use separate weight tensors
end
Test->>GL: Forward and backward pass
GL-->>Test: Results
alt single_param=True
Test->>Env: Delete NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS
end
Last reviewed commit: 197f2e6 |
| if bool(int(os.getenv("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0"))): | ||
| self.make_grouped_weights(defer_init=defer_init) |
There was a problem hiding this comment.
We're not actually exposing a single param, but rather we're sneakily allocating the weights to be contiguous.
| if bool(int(os.getenv("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0"))): | |
| self.make_grouped_weights(defer_init=defer_init) | |
| if bool(int(os.getenv("NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS", "0"))): | |
| self.make_grouped_weights(defer_init=defer_init) |
Constructor kwargs are also much more Pythonic than these hidden envvars. It's not terrible for now since this is an internal implementation detail, but we should expose a proper kwarg when we actually add the option for a single param.
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
| super().reset_parameters(defer_init=defer_init) | ||
| self.make_grouped_weights(defer_init=defer_init) | ||
| # Grouped tensor weights is an opt-in feature. | ||
| if bool(int(os.getenv("NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS", "0"))): |
There was a problem hiding this comment.
This parsing will raise ValueError if the envvar is set to non-numeric values like "true", "false", or empty string. Consider more robust parsing if users might set this directly:
| if bool(int(os.getenv("NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS", "0"))): | |
| if os.getenv("NVTE_ALLOC_CONTIGUOUS_GROUPED_LINEAR_WEIGHTS", "0").lower() in ("1", "true", "yes"): |
Description
Follow-up to #2654 that makes grouped weights opt-in via an envvar. The variable name is inaccurate for now but impending work will convert this opt-in option to use only a single-parameter.
Type of change
Changes
Checklist: