<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI Security | Academic</title><link>https://xinjie-shen.com/tag/ai-security/</link><atom:link href="https://xinjie-shen.com/tag/ai-security/index.xml" rel="self" type="application/rss+xml"/><description>AI Security</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 01 Nov 2025 00:00:00 +0000</lastBuildDate><image><url>https://xinjie-shen.com/media/icon_hu646f7301b7fde7528ecdae8cec89fc29_9606_512x512_fill_lanczos_center_3.png</url><title>AI Security</title><link>https://xinjie-shen.com/tag/ai-security/</link></image><item><title>New Preprint: CKA-Agent Achieves 96-99% Jailbreak Success on GPT, Gemini &amp; Claude</title><link>https://xinjie-shen.com/post/cka-agent/</link><pubDate>Sat, 01 Nov 2025 00:00:00 +0000</pubDate><guid>https://xinjie-shen.com/post/cka-agent/</guid><description>&lt;h2 id="-new-preprint-on-arxiv">🎉 New Preprint on arXiv&lt;/h2>
&lt;p>Our paper &lt;strong>&amp;ldquo;The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search&amp;rdquo;&lt;/strong> is now available on &lt;a href="https://arxiv.org/abs/2512.01353" target="_blank" rel="noopener">arXiv&lt;/a>.&lt;/p>
&lt;h2 id="-key-results">🔑 Key Results&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>96-99% attack success rates&lt;/strong> against GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5&lt;/li>
&lt;li>&lt;strong>15-21pp improvement&lt;/strong> over best decomposition baselines&lt;/li>
&lt;li>&lt;strong>Up to 96× improvement&lt;/strong> over prompt optimization methods on robustly defended models&lt;/li>
&lt;/ul>
&lt;h2 id="-core-insight">💡 Core Insight&lt;/h2>
&lt;p>Current guardrails detect malicious intent in optimized prompts but &lt;strong>cannot aggregate intent across innocuous queries&lt;/strong>. CKA-Agent exploits this by weaving harmless sub-queries that collectively extract restricted knowledge.&lt;/p>
&lt;h2 id="-links">🔗 Links&lt;/h2>
&lt;ul>
&lt;li>📄 &lt;a href="https://arxiv.org/abs/2512.01353" target="_blank" rel="noopener">Paper (arXiv)&lt;/a>&lt;/li>
&lt;li>🌐 &lt;a href="https://cka-agent.github.io/" target="_blank" rel="noopener">Project Page&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>