Skip to content

stabgan/cloak-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloak CLI

Universal Adversarial Watermarking for the AI Era.

Cloak is an open-source CLI tool that applies adversarial perturbations across five data modalities — audio, text, tabular, image, and video — to degrade AI model performance on your data while preserving fidelity for human consumers. All processing runs locally on consumer hardware.

Install

pip install -e ".[dev]"

Quick Start

# Image: PGD attack against CLIP encoder (targeted, Nightshade-style)
cloak apply photo.png --type image --strength high

# Audio: PGD attack against Whisper encoder with psychoacoustic masking
cloak apply podcast.wav --type audio --strength medium

# Text: Homoglyph injection + semantic shifting (Ollama) + structural perturbation
cloak apply article.md --type text --method all

# Tabular: Gaussian noise + correlation breaking + categorical swapping
cloak apply data.csv --type tabular --inplace

# Video: Per-frame image cloaking with temporal warm-start + audio cloaking
cloak apply clip.mp4 --type video --strength medium

# Batch: Process entire directory
cloak apply ./assets/ --type image --strength high

Modalities

Modality Mechanism Target Encoder
Image Targeted PGD against OpenCLIP ViT-B/32 + LPIPS quality constraint CLIP
Video Per-frame image PGD with warm-start temporal consistency + audio cloaking CLIP + Whisper
Audio PGD against quantized Whisper + psychoacoustic masking Whisper
Text Homoglyph injection, LLM semantic shifting (Ollama), zero-width structural perturbation BPE tokenizers + SBERT
Tabular Bounded Gaussian noise + correlation breaking + categorical proximity swapping ML classifiers

v1 Benchmark Results

Tested against real AI models via Bedrock proxy (Gemma 3 27B, Qwen3 VL 235B) and local evaluation (Whisper, sklearn RandomForest). All numbers are from automated validation on programmatically generated test assets.

Image Cloaking (PGD against CLIP)

Strength CLIP Cosine Sim LPIPS SSIM Description Sim vs Qwen3 VL
Low 0.980 0.001 0.983 0.909
Medium 0.912 0.021 0.879 0.908
High 0.934 0.091 0.723 0.877

The targeted PGD (Nightshade-style) shifts CLIP embeddings and applies visible perturbation at high strength (SSIM 0.72), but modern vision LLMs (Qwen3 VL 235B) remain robust — description similarity stays above 0.87. The attack is effective against the targeted encoder (CLIP) but transfers poorly to larger multimodal models.

Text Cloaking (Homoglyph + Semantic + Structural)

Strength SBERT Embed Sim Summary Sim vs Gemma 3 27B Homoglyphs Semantic Edits
Medium 0.888 0.941 13 1
High 0.920 0.940 19 0

Homoglyph injection and zero-width character insertion disrupt BPE tokenizers but modern LLMs handle Unicode gracefully. The semantic shifter (via Ollama) is constrained by the SBERT similarity threshold — most rewrites are rejected to preserve readability. Text cloaking is most effective against embedding-based RAG retrieval, less effective against direct LLM comprehension.

Tabular Cloaking (Gaussian Noise + Correlation Breaking)

Strength ML Accuracy Drop Correlation Frobenius Categorical Swaps
Medium 9.0% 0.172 55
High 17.6% 0.091 135

Tabular cloaking degrades RandomForest classifier accuracy by up to 17.6% at high strength while preserving macro statistics (mean, std within tolerance). The correlation-breaking shuffle disrupts micro-level patterns that ML models exploit. Close to the 20% target but constrained by the macro-stat preservation requirement.

Audio Cloaking (PGD against Whisper)

Strength PESQ SNR (dB) L-inf Est. WER Increase
Medium 4.61 48.4 0.005 50%

Audio perturbation achieves excellent perceptual quality (PESQ 4.61, well above the 3.5 threshold) with estimated 50% WER increase against Whisper. The psychoacoustic masking concentrates perturbations in inaudible frequency bands.

Known Limitations

  • Vision model robustness: Modern multimodal LLMs (Qwen3 VL, GPT-4V) are significantly more robust to adversarial image perturbations than the targeted CLIP encoder. Attacks that shift CLIP embeddings don't necessarily fool larger vision models.
  • Text semantic shifting: The SBERT similarity threshold constrains how much semantic content can change. Aggressive rewriting that would fool LLMs also makes text noticeably different to humans.
  • Tabular macro-stat constraint: Preserving macro statistics (mean, std) limits how much noise can be added, capping ML accuracy degradation at ~18% for high strength.
  • Audio evaluation: Local Whisper WER evaluation requires ffmpeg in PATH. The Bedrock proxy doesn't support audio multimodal input.
  • No GPU acceleration tested: All benchmarks run on CPU (Apple Silicon). GPU would significantly speed up image/video PGD.

Options

Flag Description Default
--type Data modality: audio, text, tabular, image, video Required
--strength Perturbation level: low, medium, high medium
--method Text method: homoglyph, semantic, structural, all all
--inplace Overwrite original file false
--output Explicit output path auto
--seed Random seed for reproducibility random

Strength Profiles

Level Image ε Image PGD Iters Tabular Tolerance Text Homoglyph Density
Low 2/255 10 3% 5%
Medium 8/255 50 5% 10%
High 16/255 100 10% 15%

Testing

# Run fast unit + property tests (no model downloads)
pytest tests/ -v

# Run all tests including slow (requires model downloads)
pytest tests/ -v -m ""

# Run effectiveness tests against real AI models (requires API credentials)
pytest tests/ -m effectiveness -v

# Run with coverage
pytest tests/ --cov=cloak --cov-report=term-missing

171+ tests covering property-based testing (Hypothesis), unit tests, and integration tests across all modalities.

Architecture

cloak/
├── cli.py              # Typer CLI entry point
├── models.py           # Enums, configs, metrics, strength profiles
├── exceptions.py       # Custom exception hierarchy
├── io.py               # File discovery and I/O (all formats)
├── image/
│   ├── clip_encoder.py # OpenCLIP ViT-B/32 wrapper (image + text encoding)
│   └── engine.py       # Targeted PGD attack + LPIPS quality validation
├── video/
│   └── engine.py       # Per-frame cloaking + temporal warm-start + audio
├── audio/
│   ├── engine.py       # PGD attack against Whisper + quality retry
│   └── masker.py       # Psychoacoustic masking (STFT-based)
├── text/
│   ├── engine.py       # Text cloaking orchestrator
│   ├── homoglyph.py    # Unicode confusable injection
│   ├── semantic.py     # LLM synonym replacement (Ollama backend)
│   └── structural.py   # Zero-width character insertion
└── tabular/
    └── engine.py       # Gaussian noise + correlation breaking + categorical swapping

How It Works

Image (Nightshade-style Targeted PGD)

Instead of just pushing the image embedding away from the original (untargeted), Cloak pushes it toward a completely different concept's CLIP text embedding (targeted). For example, an image of geometric shapes gets pushed toward "a photograph of a sunset over the ocean." This targeted approach is 2-5x more effective than untargeted PGD at the same perturbation budget.

Text (Layered Defense)

Three independent perturbation layers:

  1. Homoglyph injection: Replace Latin characters with visually identical Unicode confusables (Cyrillic, fullwidth) to break BPE tokenization
  2. Semantic shifting: Use a local LLM (Ollama) to rewrite sentences with adversarial vocabulary while preserving meaning
  3. Structural perturbation: Insert invisible zero-width Unicode characters at token boundaries

Tabular (Statistical Poisoning)

  1. Gaussian noise: Calibrated per-column noise that preserves macro statistics (mean, std) while scrambling micro-level patterns
  2. Correlation breaking: Shuffle rows within correlated column pairs to destroy the statistical relationships ML models exploit
  3. Categorical swapping: Replace categorical values with proximity-based neighbors

Dependencies

All open-source with permissive licenses:

Package License Purpose
open-clip-torch MIT CLIP encoder for image attacks
lpips BSD-2 Perceptual similarity metric
openai-whisper MIT Audio encoder for PGD attacks
sentence-transformers Apache-2.0 SBERT similarity verification
Pillow HPND Image I/O
opencv-python Apache-2.0 Video frame extraction
scikit-image BSD-3 SSIM computation
torch BSD-3 Gradient computation

Nightshade and Glaze are closed-source and explicitly excluded.

Core Principles

  • Zero-Training Infrastructure: White-box attacks against existing encoders — no model training required
  • Local Execution First: All processing runs locally, no data leaves your machine
  • Defense in Depth: Multiple perturbation layers per modality
  • Quality Preservation: All perturbations bounded by perceptual quality metrics (LPIPS, SSIM, PESQ, SBERT)

License

MIT

About

Open-source CLI tool for adversarial data watermarking — renders audio, text, tabular, image, and video data unusable for AI training and RAG pipelines while preserving human fidelity. PGD attacks against CLIP/Whisper, homoglyph injection, semantic shifting, statistical poisoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages