Block AIs from reading your text with invisible Unicode characters while preserving meaning for humans.
LLM tokenizers don't have dedicated tokens for most Unicode characters—especially obscure zero-width characters. Instead, they fall back to byte-level encoding, splitting each Unicode character into its raw UTF-8 bytes. A single zero-width character can become 3-4 byte tokens.
When you gibberify text, the AI doesn't see "Hello"—it sees something like "H [byte] [byte] [byte] e [byte] [byte] [byte] l [byte] [byte] [byte]..." The model receives a flood of seemingly random byte tokens that overwhelm its context window and break its ability to understand the actual message.
This exploits a fundamental limitation: tokenizers are optimized for common text, not adversarial Unicode sequences.
Result: Doesn't understand gibberified text - responds with confusion or completely ignores the invisible characters.
See ChatGPT →Result: Crashes or errors when encountering gibberified text - cannot process the Unicode obfuscation.
Meta AI Crashes ⚠️Result: Completely bewildered by gibberified text - has no idea what's happening with the invisible characters.
See Grok →Result: Gets confused by the invisible Unicode characters and produces garbled or incomplete responses.
See Perplexity →Gibberified text also breaks AI-powered web scraping tools
Result: Firecrawl's AI scraper fails to extract any content from gibberified text.
Result: TLDR This summarization tool cannot process gibberified content properly.