Cloak CLI

Universal Adversarial Watermarking for the AI Era.

Cloak is an open-source CLI tool that applies adversarial perturbations across five data modalities — audio, text, tabular, image, and video — to degrade AI model performance on your data while preserving fidelity for human consumers. All processing runs locally on consumer hardware.

Install

pip install -e ".[dev]"

Quick Start

# Image: PGD attack against CLIP encoder (targeted, Nightshade-style)
cloak apply photo.png --type image --strength high

# Audio: PGD attack against Whisper encoder with psychoacoustic masking
cloak apply podcast.wav --type audio --strength medium

# Text: Homoglyph injection + semantic shifting (Ollama) + structural perturbation
cloak apply article.md --type text --method all

# Tabular: Gaussian noise + correlation breaking + categorical swapping
cloak apply data.csv --type tabular --inplace

# Video: Per-frame image cloaking with temporal warm-start + audio cloaking
cloak apply clip.mp4 --type video --strength medium

# Batch: Process entire directory
cloak apply ./assets/ --type image --strength high

Modalities

Modality	Mechanism	Target Encoder
Image	Targeted PGD against OpenCLIP ViT-B/32 + LPIPS quality constraint	CLIP
Video	Per-frame image PGD with warm-start temporal consistency + audio cloaking	CLIP + Whisper
Audio	PGD against quantized Whisper + psychoacoustic masking	Whisper
Text	Homoglyph injection, LLM semantic shifting (Ollama), zero-width structural perturbation	BPE tokenizers + SBERT
Tabular	Bounded Gaussian noise + correlation breaking + categorical proximity swapping	ML classifiers

v1 Benchmark Results

Tested against real AI models via Bedrock proxy (Gemma 3 27B, Qwen3 VL 235B) and local evaluation (Whisper, sklearn RandomForest). All numbers are from automated validation on programmatically generated test assets.

Image Cloaking (PGD against CLIP)

Strength	CLIP Cosine Sim	LPIPS	SSIM	Description Sim vs Qwen3 VL
Low	0.980	0.001	0.983	0.909
Medium	0.912	0.021	0.879	0.908
High	0.934	0.091	0.723	0.877

The targeted PGD (Nightshade-style) shifts CLIP embeddings and applies visible perturbation at high strength (SSIM 0.72), but modern vision LLMs (Qwen3 VL 235B) remain robust — description similarity stays above 0.87. The attack is effective against the targeted encoder (CLIP) but transfers poorly to larger multimodal models.

Text Cloaking (Homoglyph + Semantic + Structural)

Strength	SBERT Embed Sim	Summary Sim vs Gemma 3 27B	Homoglyphs	Semantic Edits
Medium	0.888	0.941	13	1
High	0.920	0.940	19	0

Homoglyph injection and zero-width character insertion disrupt BPE tokenizers but modern LLMs handle Unicode gracefully. The semantic shifter (via Ollama) is constrained by the SBERT similarity threshold — most rewrites are rejected to preserve readability. Text cloaking is most effective against embedding-based RAG retrieval, less effective against direct LLM comprehension.

Tabular Cloaking (Gaussian Noise + Correlation Breaking)

Strength	ML Accuracy Drop	Correlation Frobenius	Categorical Swaps
Medium	9.0%	0.172	55
High	17.6%	0.091	135

Tabular cloaking degrades RandomForest classifier accuracy by up to 17.6% at high strength while preserving macro statistics (mean, std within tolerance). The correlation-breaking shuffle disrupts micro-level patterns that ML models exploit. Close to the 20% target but constrained by the macro-stat preservation requirement.

Audio Cloaking (PGD against Whisper)

Strength	PESQ	SNR (dB)	L-inf	Est. WER Increase
Medium	4.61	48.4	0.005	50%

Audio perturbation achieves excellent perceptual quality (PESQ 4.61, well above the 3.5 threshold) with estimated 50% WER increase against Whisper. The psychoacoustic masking concentrates perturbations in inaudible frequency bands.

Known Limitations

Vision model robustness: Modern multimodal LLMs (Qwen3 VL, GPT-4V) are significantly more robust to adversarial image perturbations than the targeted CLIP encoder. Attacks that shift CLIP embeddings don't necessarily fool larger vision models.
Text semantic shifting: The SBERT similarity threshold constrains how much semantic content can change. Aggressive rewriting that would fool LLMs also makes text noticeably different to humans.
Tabular macro-stat constraint: Preserving macro statistics (mean, std) limits how much noise can be added, capping ML accuracy degradation at ~18% for high strength.
Audio evaluation: Local Whisper WER evaluation requires ffmpeg in PATH. The Bedrock proxy doesn't support audio multimodal input.
No GPU acceleration tested: All benchmarks run on CPU (Apple Silicon). GPU would significantly speed up image/video PGD.

Options

Flag	Description	Default
`--type`	Data modality: audio, text, tabular, image, video	Required
`--strength`	Perturbation level: low, medium, high	medium
`--method`	Text method: homoglyph, semantic, structural, all	all
`--inplace`	Overwrite original file	false
`--output`	Explicit output path	auto
`--seed`	Random seed for reproducibility	random

Strength Profiles

Level	Image ε	Image PGD Iters	Tabular Tolerance	Text Homoglyph Density
Low	2/255	10	3%	5%
Medium	8/255	50	5%	10%
High	16/255	100	10%	15%

Testing

# Run fast unit + property tests (no model downloads)
pytest tests/ -v

# Run all tests including slow (requires model downloads)
pytest tests/ -v -m ""

# Run effectiveness tests against real AI models (requires API credentials)
pytest tests/ -m effectiveness -v

# Run with coverage
pytest tests/ --cov=cloak --cov-report=term-missing

171+ tests covering property-based testing (Hypothesis), unit tests, and integration tests across all modalities.

Architecture

cloak/
├── cli.py              # Typer CLI entry point
├── models.py           # Enums, configs, metrics, strength profiles
├── exceptions.py       # Custom exception hierarchy
├── io.py               # File discovery and I/O (all formats)
├── image/
│   ├── clip_encoder.py # OpenCLIP ViT-B/32 wrapper (image + text encoding)
│   └── engine.py       # Targeted PGD attack + LPIPS quality validation
├── video/
│   └── engine.py       # Per-frame cloaking + temporal warm-start + audio
├── audio/
│   ├── engine.py       # PGD attack against Whisper + quality retry
│   └── masker.py       # Psychoacoustic masking (STFT-based)
├── text/
│   ├── engine.py       # Text cloaking orchestrator
│   ├── homoglyph.py    # Unicode confusable injection
│   ├── semantic.py     # LLM synonym replacement (Ollama backend)
│   └── structural.py   # Zero-width character insertion
└── tabular/
    └── engine.py       # Gaussian noise + correlation breaking + categorical swapping

How It Works

Image (Nightshade-style Targeted PGD)

Instead of just pushing the image embedding away from the original (untargeted), Cloak pushes it toward a completely different concept's CLIP text embedding (targeted). For example, an image of geometric shapes gets pushed toward "a photograph of a sunset over the ocean." This targeted approach is 2-5x more effective than untargeted PGD at the same perturbation budget.

Text (Layered Defense)

Three independent perturbation layers:

Homoglyph injection: Replace Latin characters with visually identical Unicode confusables (Cyrillic, fullwidth) to break BPE tokenization
Semantic shifting: Use a local LLM (Ollama) to rewrite sentences with adversarial vocabulary while preserving meaning
Structural perturbation: Insert invisible zero-width Unicode characters at token boundaries

Tabular (Statistical Poisoning)

Gaussian noise: Calibrated per-column noise that preserves macro statistics (mean, std) while scrambling micro-level patterns
Correlation breaking: Shuffle rows within correlated column pairs to destroy the statistical relationships ML models exploit
Categorical swapping: Replace categorical values with proximity-based neighbors

Dependencies

All open-source with permissive licenses:

Package	License	Purpose
open-clip-torch	MIT	CLIP encoder for image attacks
lpips	BSD-2	Perceptual similarity metric
openai-whisper	MIT	Audio encoder for PGD attacks
sentence-transformers	Apache-2.0	SBERT similarity verification
Pillow	HPND	Image I/O
opencv-python	Apache-2.0	Video frame extraction
scikit-image	BSD-3	SSIM computation
torch	BSD-3	Gradient computation

Nightshade and Glaze are closed-source and explicitly excluded.

Core Principles

Zero-Training Infrastructure: White-box attacks against existing encoders — no model training required
Local Execution First: All processing runs locally, no data leaves your machine
Defense in Depth: Multiple perturbation layers per modality
Quality Preservation: All perturbations bounded by perceptual quality metrics (LPIPS, SSIM, PESQ, SBERT)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cloak		cloak
demo		demo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloak CLI

Install

Quick Start

Modalities

v1 Benchmark Results

Image Cloaking (PGD against CLIP)

Text Cloaking (Homoglyph + Semantic + Structural)

Tabular Cloaking (Gaussian Noise + Correlation Breaking)

Audio Cloaking (PGD against Whisper)

Known Limitations

Options

Strength Profiles

Testing

Architecture

How It Works

Image (Nightshade-style Targeted PGD)

Text (Layered Defense)

Tabular (Statistical Poisoning)

Dependencies

Core Principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cloak CLI

Install

Quick Start

Modalities

v1 Benchmark Results

Image Cloaking (PGD against CLIP)

Text Cloaking (Homoglyph + Semantic + Structural)

Tabular Cloaking (Gaussian Noise + Correlation Breaking)

Audio Cloaking (PGD against Whisper)

Known Limitations

Options

Strength Profiles

Testing

Architecture

How It Works

Image (Nightshade-style Targeted PGD)

Text (Layered Defense)

Tabular (Statistical Poisoning)

Dependencies

Core Principles

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages