Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour
Zhang, T., Au Yeung, C., Aurelia, E., Onishi, Y., Chulpongsatorn, N., Li, J., and Tang, A. (2025). Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 60.
Acceptance: 27.0% - 1198/4444. Honourable Mention Award (Top 5% of all submissions)
Abstract
Current voice agents wait for a user to complete their verbal instruction before responding; yet, this is misaligned with how humans engage in everyday conversational interaction, where interlocutors use multimodal signaling (e.g. nodding, grunting, or looking at referred to objects) to ensure conversational grounding. We designed an embodied VR agent that exhibits multimodal signaling behaviors in response to situated prompts, by turning its head, or by visually highlighting objects being discussed or referred to. We explore how people prompt this agent to design and manipulate the objects in a VR scene. Through a Wizard of Oz study, we found that participants interacting with an agent that indicated its understanding of spatial and action references were able to prevent errors 30% of the time, and were more satisfied and confident in the agent’s abilities. These findings underscore the importance of designing multimodal signaling communication techniques for future embodied agents.
Materials
URL (https://doi.org/10.1145/3706598.3713110)
DOI (10.1145/3706598.3713110)
Keywords
situated prompting, multimodal signaling, common ground, human-ai collaboration
BibTeX
@inproceedings{zhang2025embodiedagent,
acceptance = {27.0% - 1198/4444},
notes = {Honourable Mention Award (Top 5% of all submissions)},
type = {conference},
series = {CHI '25},
keywords = {situated prompting, multimodal signaling, common ground, human-ai collaboration},
numpages = {25},
articleno = {60},
booktitle = {Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems},
abstract = {Current voice agents wait for a user to complete their verbal instruction before responding; yet, this is misaligned with how humans engage in everyday conversational interaction, where interlocutors use multimodal signaling (e.g. nodding, grunting, or looking at referred to objects) to ensure conversational grounding. We designed an embodied VR agent that exhibits multimodal signaling behaviors in response to situated prompts, by turning its head, or by visually highlighting objects being discussed or referred to. We explore how people prompt this agent to design and manipulate the objects in a VR scene. Through a Wizard of Oz study, we found that participants interacting with an agent that indicated its understanding of spatial and action references were able to prevent errors 30% of the time, and were more satisfied and confident in the agent’s abilities. These findings underscore the importance of designing multimodal signaling communication techniques for future embodied agents.},
doi = {10.1145/3706598.3713110},
url = {https://doi.org/10.1145/3706598.3713110},
publisher = {Association for Computing Machinery},
year = {2025},
title = {Prompting an Embodied AI Agent: How Embodiment and Multimodal Signaling Affects Prompting Behaviour},
author = {Zhang, Tianyi and Au Yeung, Colin and Aurelia, Emily and Onishi, Yuki and Chulpongsatorn, Neil and Li, Jiannan and Tang, Anthony},
}