Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour

Tianyi Zhang, Colin Au Yeung, Emily Aurelia, Yuki Onishi, Neil Chulpongsatorn, Jiannan Li, and Anthony Tang. (2025). Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery. Notes: Honourable Mention Award (Top 5% of all submissions).

Abstract

Current voice agents wait for a user to complete their verbal instruction before responding; yet, this is misaligned with how humans engage in everyday conversational interaction, where interlocutors use multimodal signaling (e.g. nodding, grunting, or looking at referred to objects) to ensure conversational grounding. We designed an embodied VR agent that exhibits multimodal signaling behaviors in response to situated prompts, by turning its head, or by visually highlighting objects being discussed or referred to. We explore how people prompt this agent to design and manipulate the objects in a VR scene. Through a Wizard of Oz study, we found that participants interacting with an agent that indicated its understanding of spatial and action references were able to prevent errors 30% of the time, and were more satisfied and confident in the agent’s abilities. These findings underscore the importance of designing multimodal signaling communication techniques for future embodied agents.

Materials

URL (https://doi.org/10.1145/3706598.3713110)
DOI (10.1145/3706598.3713110)

Keywords

situated prompting, multimodal signaling, common ground, human-ai collaboration

BibTeX

@inproceedings{zhang2025embodiedagent,
  author = {Zhang, Tianyi and Au Yeung, Colin and Aurelia, Emily and Onishi, Yuki and Chulpongsatorn, Neil and Li, Jiannan and Tang, Anthony},
  title = {Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour},
  year = {2025},
  publisher = {Association for Computing Machinery},
  url = {https://doi.org/10.1145/3706598.3713110},
  doi = {10.1145/3706598.3713110},
  booktitle = {Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems},
  articleno = {60},
  numpages = {25},
  keywords = {situated prompting, multimodal signaling, common ground, human-ai collaboration},
  series = {CHI '25},
  type = {conference},
  notes = {Honourable Mention Award (Top 5% of all submissions)}
}