Skip to content

Fix OneDFT MPI double-counting of EXC energy#187

Merged
awvwgk merged 1 commit intowavefunction91:skalafrom
tvogels:fix/onedft-mpi-double-counting
Mar 19, 2026
Merged

Fix OneDFT MPI double-counting of EXC energy#187
awvwgk merged 1 commit intowavefunction91:skalafrom
tvogels:fix/onedft-mpi-double-counting

Conversation

@tvogels
Copy link

@tvogels tvogels commented Mar 19, 2026

In the OneDFT integrator path, only rank 0 computes the XC energy from the neural network model. The energy is then allreduced with MPI_SUM across all MPI ranks. On repeated calls, non-rank-0 ranks still hold the correct energy value from the previous allreduce, causing the Sum to yield 2× the correct value.

Fix: Zero EXC[0] on non-rank-0 processes before the allreduce, so only rank 0's contribution is summed. This affects both the host and device integrator paths.

This error was actually caught by an MPI test that was failing. Test passes after this edit.

In the OneDFT integrator path, only rank 0 computes the XC energy from
the neural network model. The energy is then allreduced with Sum across
all MPI ranks. On repeated calls, non-rank-0 ranks still hold the
correct energy value from the previous allreduce, causing the Sum to
yield 2x the correct value.

Fix by zeroing EXC on non-rank-0 processes before the allreduce, so
only rank 0's contribution is summed. This affects both the host and
device integrator paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@awvwgk awvwgk merged commit ab5c80e into wavefunction91:skala Mar 19, 2026
0 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants