Embodied AI for Vision-Language based Navigation

Healthcare Robotics

A simulation platform that trains AI agents to understand and perform hospital tasks using natural language in 3D and text based environments boosting their reasoning and real-world readiness

Institute:
IIT Kharagpur

PI Name:
Prof. Pawan Goya

Technology Readiness Level (TRL)
3

Problem
Addressed

This project focuses on developing complementary simulation environments tailored for training embodied agents to perform hospital-based tasks. The environments cater to two specific domains:

Vision-Language Navigation (VLN): Agents learn to navigate a simulated 3D hospital setting using natural language instructions. Here we address the challenge of training agents in visually rich environments
Text-Based Tasks: Agents interact with a text-based representation of a hospital to perform language driven reasoning and task execution. This enables language-centric reasoning in purely text-based scenario (TextWorld).

About the
Technology

VLN using Unity: Unity3D provides a development platform for a visually immersive hospital simulation environment. The build of the environment is referenced to a hospital setup and then the simulation using an agent is developed on a vision language navigation framework.
Text-based tasks using Text World : TextWorld offers a framework for simulating text-based hospital tasks, focusing on the agent’s ability

Simulation Scope:

First-person waypoints based navigation in the simulation environment.
Environment consists of corridors, wards, ICUs, operating rooms, etc.
Dynamic elements : medical equipments (few).
Textual descriptions of hospital rooms and objects.
Commands issued as text (e.g., “Pick up the thermometer from the cabinet”).

Agent Training:

Input: RGB or RGB-D visual data and natural language instructions using medical vocabulary.
Output : Navigation actions (move forward, turn left) or textual responses in reasoning.

Application Areas & Use Cases

Healthcare Robotics: train robots to assist in navigation and reasoning in hospital environments
Al Training for Assistive Systems: Develop intelligent systems for patient care and logistics.
Cognitive Reasoning: Test language models for understanding and executing instructions.