New Preprint: CKA-Agent Achieves 96-99% Jailbreak Success on GPT, Gemini & Claude

Xinjie Shen 沈鑫杰

Last updated on Dec 23, 2025 1 min read Research

🎉 New Preprint on arXiv

Our paper “The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search” is now available on arXiv.

🔑 Key Results

96-99% attack success rates against GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5
15-21pp improvement over best decomposition baselines
Up to 96× improvement over prompt optimization methods on robustly defended models

💡 Core Insight

Current guardrails detect malicious intent in optimized prompts but cannot aggregate intent across innocuous queries. CKA-Agent exploits this by weaving harmless sub-queries that collectively extract restricted knowledge.