Skip to content

Deep-dive into the deployment of an on-premise low-privilege...#2043

Open
carlospolop wants to merge 1 commit intomasterfrom
update_Deep-dive_into_the_deployment_of_an_on-premise_low_20260320_185529
Open

Deep-dive into the deployment of an on-premise low-privilege...#2043
carlospolop wants to merge 1 commit intomasterfrom
update_Deep-dive_into_the_deployment_of_an_on-premise_low_20260320_185529

Conversation

@carlospolop
Copy link
Collaborator

🤖 Automated Content Update

This PR was automatically generated by the HackTricks News Bot based on a technical blog post.

📝 Source Information

🎯 Content Summary

Title/Context
The post (Synacktiv, published 20/03/2026) documents a security-first, low-privilege deployment of an on‑premise LLM inference server intended for confidential business data. The core goals are: air-gapped instances (no data exfil via network), strong isolation between teams/projects (one GPU per project), and minimal host attack surface. The stack used throughout the article is:
OS: Debian 13 (hardened)
Inf...

🔧 Technical Details

Reduce inference-server data exposure by allowlisting API endpoints: Treat LLM inference servers as sensitive multi-user services. Debug/monitoring routes can expose internal state (e.g., a historical llama.cpp /slots endpoint that leaked full prompt contents when slot debugging was enabled). Place a reverse proxy in front of the server and enforce a strict deny-by-default allowlist (e.g., nginx map returning 403 for non-audited routes). Additionally disable server-side slot monitoring (--no-slots) so prompt/slot inspection endpoints cannot be used to exfiltrate user prompts and confidential inputs.

Run LLM servers in rootless containers without networking using UNIX sockets: If the inference server can listen on a UNIX socket, run containers with --network=none to eliminate TCP/IP exposure and remove the need for user-mode networking daemons (e.g., pasta). Control access with POSIX per...

🤖 Agent Actions

Updated AI-Risk-Frameworks.md with a new AI-specific section on self-hosted LLM inference hardening.

The added content focuses on the non-duplicative parts from the Synacktiv post: prompt leakage through debug endpoints like /slots, deny-by-default reverse-proxy allowlisting, rootless Podman with --network none over UNIX sockets, GPU device-node minimization including nvidia-uvm risk, and AppArmor/LSM confinement. I kept it in a single file and updated the references at the end of AI-Risk-Frameworks.md.

Verification: checked the final diff and only this file changed.


This PR was automatically created by the HackTricks Feed Bot. Please review the changes carefully before merging.

@carlospolop
Copy link
Collaborator Author

🔗 Additional Context

Original Blog Post: https://www.synacktiv.com/en/publications/deep-dive-into-the-deployment-of-an-on-premise-low-privileged-llm-server.html

Content Categories: Based on the analysis, this content was categorized under "🤖 AI Security (LLM serving hardening / LLM infrastructure attack surface) and/or 🐧 Linux Privilege Escalation -> Docker Security (rootless Podman, --network=none + UNIX sockets, AppArmor profiles); plus a note under 🕸️ Pentesting Web (reverse-proxy endpoint allowlisting to prevent unintended debug/info-leak routes)".

Repository Maintenance:

  • MD Files Formatting: 954 files processed

Review Notes:

  • This content was automatically processed and may require human review for accuracy
  • Check that the placement within the repository structure is appropriate
  • Verify that all technical details are correct and up-to-date
  • All .md files have been checked for proper formatting (headers, includes, etc.)

Bot Version: HackTricks News Bot v1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant