<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Embodied AI | Academic</title><link>https://xinjie-shen.com/tag/embodied-ai/</link><atom:link href="https://xinjie-shen.com/tag/embodied-ai/index.xml" rel="self" type="application/rss+xml"/><description>Embodied AI</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 13 Oct 2025 00:00:00 +0000</lastBuildDate><image><url>https://xinjie-shen.com/media/icon_hu646f7301b7fde7528ecdae8cec89fc29_9606_512x512_fill_lanczos_center_3.png</url><title>Embodied AI</title><link>https://xinjie-shen.com/tag/embodied-ai/</link></image><item><title>Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark</title><link>https://xinjie-shen.com/publication/conference-paper/eaprivacy/</link><pubDate>Mon, 13 Oct 2025 00:00:00 +0000</pubDate><guid>https://xinjie-shen.com/publication/conference-paper/eaprivacy/</guid><description>&lt;h2 id="key-findings-">Key Findings 🌟&lt;/h2>
&lt;h3 id="1-performance-degradation-with-complexity">1. Performance Degradation with Complexity&lt;/h3>
&lt;p>More complex physical scenes lead to significant consistency drops in privacy awareness.&lt;/p>
&lt;h3 id="2-neutrality-bias">2. Neutrality Bias&lt;/h3>
&lt;p>Models tend to avoid &amp;ldquo;hard negatives&amp;rdquo; but over-select &amp;ldquo;neutral&amp;rdquo; options instead of the correct &amp;ldquo;hard positive&amp;rdquo; choices.&lt;/p>
&lt;h3 id="3-over-thinking-effect">3. &amp;ldquo;Over-thinking&amp;rdquo; Effect&lt;/h3>
&lt;p>Enabling explicit reasoning often degrades performance across evaluation tiers, suggesting that more complex reasoning doesn&amp;rsquo;t always improve privacy awareness.&lt;/p>
&lt;h2 id="evaluation-framework-4-tiers-">Evaluation Framework (4 Tiers) 🔍&lt;/h2>
&lt;h3 id="-tier-1-sensitive-object-identification">🧭 Tier 1: Sensitive Object Identification&lt;/h3>
&lt;p>Tests the ability to identify inherently sensitive items (e.g., passports, SSNs) amid clutter. Evaluates spatial grounding and understanding of what constitutes &amp;ldquo;sensitive&amp;rdquo; in physical space. Performance drops significantly with increased clutter and shows over-flagging of non-sensitive items.&lt;/p>
&lt;h3 id="-tier-2-privacy-in-shifting-environments">🧑‍🤝‍🧑 Tier 2: Privacy in Shifting Environments&lt;/h3>
&lt;p>Evaluates judgment of action appropriateness under changing social contexts. Even the best performing model (Gemini 2.5 Pro) correctly selects the appropriate action only 59% of the time.&lt;/p>
&lt;h3 id="-tier-3-inferential-privacy-under-task-conflicts">🧠 Tier 3: Inferential Privacy under Task Conflicts&lt;/h3>
&lt;p>Tests the balance between &amp;ldquo;finish the task&amp;rdquo; vs. &amp;ldquo;don&amp;rsquo;t reveal a secret inferred from context.&amp;rdquo; Privacy violations exceed 70% for most models (up to 98%), while task completion rates often approach 0%.&lt;/p>
&lt;h3 id="-tier-4-social-norms-vs-personal-privacy">🚨 Tier 4: Social Norms vs. Personal Privacy&lt;/h3>
&lt;p>Handles high-stakes dilemmas (e.g., safety vs. privacy). While models show improvement in this tier, they still fail in non-trivial cases, with aggregate analyses revealing disregard for critical norms over 15% of the time.&lt;/p>
&lt;h2 id="implications-for-embodied-ai">Implications for Embodied AI&lt;/h2>
&lt;p>This research highlights critical gaps in current LLMs&amp;rsquo; privacy awareness when deployed in physical environments. As chatbots evolve into embodied assistants in homes and workplaces, understanding and addressing these privacy limitations becomes essential for responsible AI deployment.&lt;/p></description></item></channel></rss>