New Preprint: CKA-Agent Achieves 96-99% Jailbreak Success on GPT, Gemini & Claude
🎉 New Preprint on arXiv
Our paper “The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search” is now available on arXiv.
🔑 Key Results
- 96-99% attack success rates against GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5
- 15-21pp improvement over best decomposition baselines
- Up to 96× improvement over prompt optimization methods on robustly defended models
💡 Core Insight
Current guardrails detect malicious intent in optimized prompts but cannot aggregate intent across innocuous queries. CKA-Agent exploits this by weaving harmless sub-queries that collectively extract restricted knowledge.