Skip to content

Turing WPI support#1335

Draft
dgvacarevelo wants to merge 1 commit intoMFlowCode:masterfrom
dgvacarevelo:master
Draft

Turing WPI support#1335
dgvacarevelo wants to merge 1 commit intoMFlowCode:masterfrom
dgvacarevelo:master

Conversation

@dgvacarevelo
Copy link
Contributor

Description

This PR adds support for the Turing supercomputer, an HPC cluster hosted by Worcester Polytechnic Institute (WPI).

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other: describe

Testing

How did you test your changes?

Checklist

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

  • @coderabbitai review — incremental review (new changes only)
  • @coderabbitai full review — full review from scratch
  • /review — Qodo review
  • /improve — Qodo code suggestions
  • @claude full review — Claude full review (also triggers on PR open/reopen/ready)
  • Add label claude-full-review — Claude full review via label

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This pull request adds WPI Turing support to the toolchain infrastructure. A new interactive menu option is exposed in the bootstrap modules selection prompt. The toolchain modules configuration introduces a new cluster group with CPU-variant modules pinning GCC/12.1.0, OpenMPI/4.1.3, and Python/3.13.5, and GPU-variant modules pinning NVHPC/24.3, Python/3.13.5, with compiler environment variables and MPI configuration. A new Mako template generates HPC batch job scripts with conditional Slurm directives, module loading, and per-target execution supporting both MPI and non-MPI modes with optional GPU resource allocation.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete. It provides minimal detail about the changes, lacks testing information, and has unchecked checklist items. No issue reference is included, and testing details are empty. Add details about what was changed (modules, templates, bootstrap), explain the testing performed, and check appropriate checklist items to indicate completion status.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Turing WPI support' clearly and concisely describes the main change: adding support for the Turing supercomputer at Worcester Polytechnic Institute.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
toolchain/templates/turing.mako (1)

17-21: Hardcoded single GPU and memory constraints may limit flexibility.

The GPU configuration hardcodes:

  • --gres=gpu:1 - Only 1 GPU regardless of job requirements
  • --mem=208G - Fixed memory allocation
  • -C "A30|A100" - Specific GPU constraints

This may be appropriate for Turing's node configuration, but consider whether multi-GPU jobs are needed.

Please confirm with WPI cluster administrators that:

  1. Single GPU per job is the intended use pattern
  2. 208G is the appropriate memory allocation for GPU nodes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2454b169-22c8-4900-b9d8-ae3d855e67c8

📥 Commits

Reviewing files that changed from the base of the PR and between 90d4b6e and faa0693.

📒 Files selected for processing (3)
  • toolchain/bootstrap/modules.sh
  • toolchain/modules
  • toolchain/templates/turing.mako

#SBATCH --export=ALL
% if email:
#SBATCH --mail-user=${email}
#SBATCH --mail-type="BEGIN, END, FAIL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: Mail-type value has spaces which may cause parsing issues.

The mail-type directive has spaces: "BEGIN, END, FAIL". Some Slurm versions expect comma-separated values without spaces.

🔧 Proposed fix
-#SBATCH --mail-type="BEGIN, END, FAIL"
+#SBATCH --mail-type=BEGIN,END,FAIL
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#SBATCH --mail-type="BEGIN, END, FAIL"
`#SBATCH` --mail-type=BEGIN,END,FAIL

Comment on lines +50 to +53
(set -x; ${profiler} \
srun --gres=gpu:1 -C "A30|A100" \
$MPI_HOME/bin/mpirun --np ${nodes*tasks_per_node} \
"${target.get_install_binpath(case)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and read the turing.mako template file
find . -name "turing.mako" -type f | head -5

Repository: MFlowCode/MFC

Length of output: 90


🏁 Script executed:

# List all .mako template files in the toolchain directory
ls -la toolchain/templates/*.mako 2>/dev/null || find toolchain -name "*.mako" -type f

Repository: MFlowCode/MFC

Length of output: 1384


🏁 Script executed:

# Read the turing.mako file to verify the actual code
if [ -f "toolchain/templates/turing.mako" ]; then
  cat -n toolchain/templates/turing.mako | sed -n '40,65p'
fi

Repository: MFlowCode/MFC

Length of output: 1153


🏁 Script executed:

# Also check other template files to see the pattern used elsewhere
fd -e mako toolchain/ -x echo "File: {}" \; -x head -70 {}

Repository: MFlowCode/MFC

Length of output: 402


🏁 Script executed:

# Check other template files for comparison - particularly GPU+MPI patterns
for file in toolchain/templates/*.mako; do
  echo "=== $(basename $file) ==="
  cat -n "$file" | grep -A 5 -B 2 "gpu_enabled\|mpirun\|srun" || echo "(no GPU/MPI pattern found)"
done

Repository: MFlowCode/MFC

Length of output: 14827


🏁 Script executed:

# Search for documentation or comments about the MPI launcher choice
rg -i "mpirun|srun.*mpi|gpu.*mpi" toolchain/ docs/ --max-count=5

Repository: MFlowCode/MFC

Length of output: 5158


🏁 Script executed:

# Check if there's any cluster-specific documentation
find . -name "*turing*" -type f | head -10

Repository: MFlowCode/MFC

Length of output: 90


🏁 Script executed:

# Check if there's any Turing cluster-specific documentation or comments
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l -i "turing" | head -10

Repository: MFlowCode/MFC

Length of output: 138


🏁 Script executed:

# Look at hipergator.mako more closely since it also uses NVHPC
cat -n toolchain/templates/hipergator.mako | sed -n '49,64p'

Repository: MFlowCode/MFC

Length of output: 860


Fix inconsistency: GPU MPI path should use srun --mpi=pmi2 like the CPU path, not wrap mpirun.

The GPU execution (lines 50-53) unusually combines srun with mpirun, which can cause process management conflicts. The CPU path (lines 55-57) correctly uses srun --mpi=pmi2. For consistency and to avoid launcher conflicts, the GPU path should follow the same pattern:

Suggested fix
        % if gpu_enabled:
            (set -x; ${profiler}                                    \
-                srun --gres=gpu:1 -C "A30|A100"                     \
-                $MPI_HOME/bin/mpirun --np ${nodes*tasks_per_node}   \
+                srun --mpi=pmi2 --gres=gpu:1 -C "A30|A100"          \
+                --ntasks=${nodes*tasks_per_node}                    \
                "${target.get_install_binpath(case)}")

@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.01%. Comparing base (6120810) to head (faa0693).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1335   +/-   ##
=======================================
  Coverage   45.01%   45.01%           
=======================================
  Files          70       70           
  Lines       20562    20562           
  Branches     1962     1962           
=======================================
  Hits         9255     9255           
  Misses      10179    10179           
  Partials     1128     1128           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson marked this pull request as draft March 24, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant