Universal Adversarial Watermarking for the AI Era.
Cloak is an open-source CLI tool that applies adversarial perturbations across five data modalities — audio, text, tabular, image, and video — to degrade AI model performance on your data while preserving fidelity for human consumers. All processing runs locally on consumer hardware.
pip install -e ".[dev]"# Image: PGD attack against CLIP encoder (targeted, Nightshade-style)
cloak apply photo.png --type image --strength high
# Audio: PGD attack against Whisper encoder with psychoacoustic masking
cloak apply podcast.wav --type audio --strength medium
# Text: Homoglyph injection + semantic shifting (Ollama) + structural perturbation
cloak apply article.md --type text --method all
# Tabular: Gaussian noise + correlation breaking + categorical swapping
cloak apply data.csv --type tabular --inplace
# Video: Per-frame image cloaking with temporal warm-start + audio cloaking
cloak apply clip.mp4 --type video --strength medium
# Batch: Process entire directory
cloak apply ./assets/ --type image --strength high| Modality | Mechanism | Target Encoder |
|---|---|---|
| Image | Targeted PGD against OpenCLIP ViT-B/32 + LPIPS quality constraint | CLIP |
| Video | Per-frame image PGD with warm-start temporal consistency + audio cloaking | CLIP + Whisper |
| Audio | PGD against quantized Whisper + psychoacoustic masking | Whisper |
| Text | Homoglyph injection, LLM semantic shifting (Ollama), zero-width structural perturbation | BPE tokenizers + SBERT |
| Tabular | Bounded Gaussian noise + correlation breaking + categorical proximity swapping | ML classifiers |
Tested against real AI models via Bedrock proxy (Gemma 3 27B, Qwen3 VL 235B) and local evaluation (Whisper, sklearn RandomForest). All numbers are from automated validation on programmatically generated test assets.
| Strength | CLIP Cosine Sim | LPIPS | SSIM | Description Sim vs Qwen3 VL |
|---|---|---|---|---|
| Low | 0.980 | 0.001 | 0.983 | 0.909 |
| Medium | 0.912 | 0.021 | 0.879 | 0.908 |
| High | 0.934 | 0.091 | 0.723 | 0.877 |
The targeted PGD (Nightshade-style) shifts CLIP embeddings and applies visible perturbation at high strength (SSIM 0.72), but modern vision LLMs (Qwen3 VL 235B) remain robust — description similarity stays above 0.87. The attack is effective against the targeted encoder (CLIP) but transfers poorly to larger multimodal models.
| Strength | SBERT Embed Sim | Summary Sim vs Gemma 3 27B | Homoglyphs | Semantic Edits |
|---|---|---|---|---|
| Medium | 0.888 | 0.941 | 13 | 1 |
| High | 0.920 | 0.940 | 19 | 0 |
Homoglyph injection and zero-width character insertion disrupt BPE tokenizers but modern LLMs handle Unicode gracefully. The semantic shifter (via Ollama) is constrained by the SBERT similarity threshold — most rewrites are rejected to preserve readability. Text cloaking is most effective against embedding-based RAG retrieval, less effective against direct LLM comprehension.
| Strength | ML Accuracy Drop | Correlation Frobenius | Categorical Swaps |
|---|---|---|---|
| Medium | 9.0% | 0.172 | 55 |
| High | 17.6% | 0.091 | 135 |
Tabular cloaking degrades RandomForest classifier accuracy by up to 17.6% at high strength while preserving macro statistics (mean, std within tolerance). The correlation-breaking shuffle disrupts micro-level patterns that ML models exploit. Close to the 20% target but constrained by the macro-stat preservation requirement.
| Strength | PESQ | SNR (dB) | L-inf | Est. WER Increase |
|---|---|---|---|---|
| Medium | 4.61 | 48.4 | 0.005 | 50% |
Audio perturbation achieves excellent perceptual quality (PESQ 4.61, well above the 3.5 threshold) with estimated 50% WER increase against Whisper. The psychoacoustic masking concentrates perturbations in inaudible frequency bands.
- Vision model robustness: Modern multimodal LLMs (Qwen3 VL, GPT-4V) are significantly more robust to adversarial image perturbations than the targeted CLIP encoder. Attacks that shift CLIP embeddings don't necessarily fool larger vision models.
- Text semantic shifting: The SBERT similarity threshold constrains how much semantic content can change. Aggressive rewriting that would fool LLMs also makes text noticeably different to humans.
- Tabular macro-stat constraint: Preserving macro statistics (mean, std) limits how much noise can be added, capping ML accuracy degradation at ~18% for high strength.
- Audio evaluation: Local Whisper WER evaluation requires ffmpeg in PATH. The Bedrock proxy doesn't support audio multimodal input.
- No GPU acceleration tested: All benchmarks run on CPU (Apple Silicon). GPU would significantly speed up image/video PGD.
| Flag | Description | Default |
|---|---|---|
--type |
Data modality: audio, text, tabular, image, video | Required |
--strength |
Perturbation level: low, medium, high | medium |
--method |
Text method: homoglyph, semantic, structural, all | all |
--inplace |
Overwrite original file | false |
--output |
Explicit output path | auto |
--seed |
Random seed for reproducibility | random |
| Level | Image ε | Image PGD Iters | Tabular Tolerance | Text Homoglyph Density |
|---|---|---|---|---|
| Low | 2/255 | 10 | 3% | 5% |
| Medium | 8/255 | 50 | 5% | 10% |
| High | 16/255 | 100 | 10% | 15% |
# Run fast unit + property tests (no model downloads)
pytest tests/ -v
# Run all tests including slow (requires model downloads)
pytest tests/ -v -m ""
# Run effectiveness tests against real AI models (requires API credentials)
pytest tests/ -m effectiveness -v
# Run with coverage
pytest tests/ --cov=cloak --cov-report=term-missing171+ tests covering property-based testing (Hypothesis), unit tests, and integration tests across all modalities.
cloak/
├── cli.py # Typer CLI entry point
├── models.py # Enums, configs, metrics, strength profiles
├── exceptions.py # Custom exception hierarchy
├── io.py # File discovery and I/O (all formats)
├── image/
│ ├── clip_encoder.py # OpenCLIP ViT-B/32 wrapper (image + text encoding)
│ └── engine.py # Targeted PGD attack + LPIPS quality validation
├── video/
│ └── engine.py # Per-frame cloaking + temporal warm-start + audio
├── audio/
│ ├── engine.py # PGD attack against Whisper + quality retry
│ └── masker.py # Psychoacoustic masking (STFT-based)
├── text/
│ ├── engine.py # Text cloaking orchestrator
│ ├── homoglyph.py # Unicode confusable injection
│ ├── semantic.py # LLM synonym replacement (Ollama backend)
│ └── structural.py # Zero-width character insertion
└── tabular/
└── engine.py # Gaussian noise + correlation breaking + categorical swapping
Instead of just pushing the image embedding away from the original (untargeted), Cloak pushes it toward a completely different concept's CLIP text embedding (targeted). For example, an image of geometric shapes gets pushed toward "a photograph of a sunset over the ocean." This targeted approach is 2-5x more effective than untargeted PGD at the same perturbation budget.
Three independent perturbation layers:
- Homoglyph injection: Replace Latin characters with visually identical Unicode confusables (Cyrillic, fullwidth) to break BPE tokenization
- Semantic shifting: Use a local LLM (Ollama) to rewrite sentences with adversarial vocabulary while preserving meaning
- Structural perturbation: Insert invisible zero-width Unicode characters at token boundaries
- Gaussian noise: Calibrated per-column noise that preserves macro statistics (mean, std) while scrambling micro-level patterns
- Correlation breaking: Shuffle rows within correlated column pairs to destroy the statistical relationships ML models exploit
- Categorical swapping: Replace categorical values with proximity-based neighbors
All open-source with permissive licenses:
| Package | License | Purpose |
|---|---|---|
| open-clip-torch | MIT | CLIP encoder for image attacks |
| lpips | BSD-2 | Perceptual similarity metric |
| openai-whisper | MIT | Audio encoder for PGD attacks |
| sentence-transformers | Apache-2.0 | SBERT similarity verification |
| Pillow | HPND | Image I/O |
| opencv-python | Apache-2.0 | Video frame extraction |
| scikit-image | BSD-3 | SSIM computation |
| torch | BSD-3 | Gradient computation |
Nightshade and Glaze are closed-source and explicitly excluded.
- Zero-Training Infrastructure: White-box attacks against existing encoders — no model training required
- Local Execution First: All processing runs locally, no data leaves your machine
- Defense in Depth: Multiple perturbation layers per modality
- Quality Preservation: All perturbations bounded by perceptual quality metrics (LPIPS, SSIM, PESQ, SBERT)
MIT