-
Notifications
You must be signed in to change notification settings - Fork 133
Turing WPI support #1335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Turing WPI support #1335
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| <%namespace name="helpers" file="helpers.mako"/> | ||
|
|
||
| % if engine == 'batch': | ||
| #SBATCH --nodes=${nodes} | ||
| #SBATCH --ntasks-per-node=${tasks_per_node} | ||
| #SBATCH --cpus-per-task=1 | ||
| #SBATCH --job-name="${name}" | ||
| #SBATCH --time=${walltime} | ||
| % if partition: | ||
| #SBATCH --partition=${partition} | ||
| % endif | ||
| % if account: | ||
| #SBATCH --account="${account}" | ||
| % endif | ||
| % if gpu_enabled: | ||
| #SBATCH --gres=gpu:1 | ||
| #SBATCH --mem=208G | ||
| #SBATCH -C "A30|A100" | ||
| % endif | ||
| #SBATCH --output="${name}.out" | ||
| #SBATCH --error="${name}.err" | ||
| #SBATCH --export=ALL | ||
| % if email: | ||
| #SBATCH --mail-user=${email} | ||
| #SBATCH --mail-type="BEGIN, END, FAIL" | ||
| % endif | ||
| % endif | ||
|
|
||
| ${helpers.template_prologue()} | ||
|
|
||
| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOT_DIR}" | ||
| . ./mfc.sh load -c t -m ${'g' if gpu_enabled else 'c'} | ||
| cd - > /dev/null | ||
| echo | ||
|
|
||
| % if gpu_enabled: | ||
| export LD_LIBRARY_PATH=/cm/shared/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-13.2.0/cuda-12.3.0-vuydybqum6mloi2vvov7yn2juaurmtao/lib64:$LD_LIBRARY_PATH | ||
| % endif | ||
|
|
||
| % for target in targets: | ||
| ${helpers.run_prologue(target)} | ||
|
|
||
| % if not mpi: | ||
| (set -x; ${profiler} "${target.get_install_binpath(case)}") | ||
| % else: | ||
| % if gpu_enabled: | ||
| (set -x; ${profiler} \ | ||
| srun --gres=gpu:1 -C "A30|A100" \ | ||
| $MPI_HOME/bin/mpirun --np ${nodes*tasks_per_node} \ | ||
| "${target.get_install_binpath(case)}") | ||
|
Comment on lines
+50
to
+53
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # Find and read the turing.mako template file
find . -name "turing.mako" -type f | head -5Repository: MFlowCode/MFC Length of output: 90 🏁 Script executed: # List all .mako template files in the toolchain directory
ls -la toolchain/templates/*.mako 2>/dev/null || find toolchain -name "*.mako" -type fRepository: MFlowCode/MFC Length of output: 1384 🏁 Script executed: # Read the turing.mako file to verify the actual code
if [ -f "toolchain/templates/turing.mako" ]; then
cat -n toolchain/templates/turing.mako | sed -n '40,65p'
fiRepository: MFlowCode/MFC Length of output: 1153 🏁 Script executed: # Also check other template files to see the pattern used elsewhere
fd -e mako toolchain/ -x echo "File: {}" \; -x head -70 {}Repository: MFlowCode/MFC Length of output: 402 🏁 Script executed: # Check other template files for comparison - particularly GPU+MPI patterns
for file in toolchain/templates/*.mako; do
echo "=== $(basename $file) ==="
cat -n "$file" | grep -A 5 -B 2 "gpu_enabled\|mpirun\|srun" || echo "(no GPU/MPI pattern found)"
doneRepository: MFlowCode/MFC Length of output: 14827 🏁 Script executed: # Search for documentation or comments about the MPI launcher choice
rg -i "mpirun|srun.*mpi|gpu.*mpi" toolchain/ docs/ --max-count=5Repository: MFlowCode/MFC Length of output: 5158 🏁 Script executed: # Check if there's any cluster-specific documentation
find . -name "*turing*" -type f | head -10Repository: MFlowCode/MFC Length of output: 90 🏁 Script executed: # Check if there's any Turing cluster-specific documentation or comments
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l -i "turing" | head -10Repository: MFlowCode/MFC Length of output: 138 🏁 Script executed: # Look at hipergator.mako more closely since it also uses NVHPC
cat -n toolchain/templates/hipergator.mako | sed -n '49,64p'Repository: MFlowCode/MFC Length of output: 860 Fix inconsistency: GPU MPI path should use The GPU execution (lines 50-53) unusually combines Suggested fix % if gpu_enabled:
(set -x; ${profiler} \
- srun --gres=gpu:1 -C "A30|A100" \
- $MPI_HOME/bin/mpirun --np ${nodes*tasks_per_node} \
+ srun --mpi=pmi2 --gres=gpu:1 -C "A30|A100" \
+ --ntasks=${nodes*tasks_per_node} \
"${target.get_install_binpath(case)}") |
||
| % else: | ||
| (set -x; ${profiler} \ | ||
| srun --mpi=pmi2 --ntasks=${nodes*tasks_per_node} \ | ||
| "${target.get_install_binpath(case)}") | ||
| % endif | ||
| % endif | ||
|
|
||
| ${helpers.run_epilogue(target)} | ||
|
|
||
| echo | ||
| % endfor | ||
|
|
||
| ${helpers.template_epilogue()} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Mail-type value has spaces which may cause parsing issues.
The mail-type directive has spaces:
"BEGIN, END, FAIL". Some Slurm versions expect comma-separated values without spaces.🔧 Proposed fix
📝 Committable suggestion