Managers and MDP Terms¶
GeneLab follows the Isaac Lab-style manager-based environment pattern: the environment owns the simulation loop, while MDP behavior is split into named terms managed by specialized managers.
Why split the MDP¶
A robot learning environment usually contains many concerns:
- action decoding
- command sampling
- observations
- rewards
- terminations
- reset and interval events
- curricula
- metrics
Putting all of that directly in step() makes tasks hard to inspect and override. Manager-based
configs keep each concern named, typed, and discoverable.
The manager set¶
| Manager | Config field | Role |
|---|---|---|
ActionManager |
actions_cfg |
Converts policy actions into simulator targets or torques. |
CommandManager |
commands_cfg |
Maintains sampled goals such as target velocity or reference motion. |
ObservationManager |
observations_cfg |
Computes named observation groups such as policy and critic. |
RewardManager |
rewards_cfg |
Computes weighted reward terms and episode summaries. |
TerminationManager |
terminations_cfg |
Separates terminated and time-out conditions. |
EventManager |
events_cfg |
Runs startup, reset, and interval randomization or perturbation. |
CurriculumManager |
curriculum_cfg |
Adjusts difficulty or task state at reset boundaries. |
MetricsManager |
metrics_cfg |
Records non-reward diagnostics. |
Runtime order¶
ManagerBasedRlEnv builds the Genesis scene first, then creates managers, binds sensors, applies
startup events, and runs an initial reset. During each env step it processes actions, advances the
scene for decimation physics ticks, refreshes state, updates sensors, computes commands, events,
rewards, metrics, terminations, resets done envs, and finally computes observations.
This order matters because terms read shared state. For example, reward terms see the state after actions and sensor updates; reset events run before the next observation is emitted.
Term names are part of the interface¶
Term names appear in override paths and logs. A reward named track_lin_vel can be changed from the
CLI and appears in episode summaries. Stable names make experiments easier to compare.