New Preprint: CKA-Agent Achieves 96-99% Jailbreak Success on GPT, Gemini & Claude

🎉 New Preprint on arXiv

Our paper “The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search” is now available on arXiv.

🔑 Key Results

  • 96-99% attack success rates against GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5
  • 15-21pp improvement over best decomposition baselines
  • Up to 96× improvement over prompt optimization methods on robustly defended models

💡 Core Insight

Current guardrails detect malicious intent in optimized prompts but cannot aggregate intent across innocuous queries. CKA-Agent exploits this by weaving harmless sub-queries that collectively extract restricted knowledge.

Xinjie Shen 沈鑫杰
Xinjie Shen 沈鑫杰
PhD Student @ Georgia Tech

My research interests include LLM and collaboration.