Eval and Export¶
GeneLab's research-reproducibility tooling under
genelab.rl.evaluator / eval_callback / exporter, surfaced as three
CLIs that close the train → eval → export loop:
| Command | Purpose | Output |
|---|---|---|
genelab eval TASK CKPT |
Deterministic rollout, fixed seed, N episodes | eval.json |
genelab train ... --eval-every K |
Periodic in-training eval + best-model save | logs/.../best_model.<ext> + best_model_meta.json |
genelab export TASK CKPT |
Backend-agnostic TorchScript / ONNX policy | policy.{ts,onnx} + <file>.metadata.json |
All three route through the same backend abstraction (InferenceSetup, defined
in genelab.rl.backends.base), so they work identically against the rsl_rl,
skrl, and sb3 backends.
genelab eval¶
Runs a vectorized deterministic rollout and writes a JSON summary with this schema:
genelab eval GeneLab-Inverted-Pendulum-v0 logs/rsl_rl/exp1/.../model_500.pt \
--num-envs 64 --episodes 100 --seed 0 \
--deterministic --out eval.json
Output:
{
"task": "GeneLab-Inverted-Pendulum-v0",
"checkpoint": "logs/.../model_500.pt",
"num_episodes": 100,
"metrics": {
"return_mean": 487.3,
"return_std": 22.1,
"length_mean": 998.4,
"success_rate": 0.96
},
"wall_clock_seconds": 18.2,
"seed": 0,
"deterministic": true,
"evaluated_at": "2026-05-20T08:42:11+00:00"
}
Success rate¶
success_rate is computed when the task publishes a per-env bool tensor at
extras["is_success"] from ManagerBasedRlEnv.step (gymnasium convention).
Tasks opt in by setting self._extras["is_success"] = <(num_envs,) bool tensor>
inside a termination or reward term — typically a check against the goal pose
for manipulation or a "reached velocity command" check for locomotion.
Tasks that do not publish is_success get success_rate: null in the
output; downstream tools (best-model selection, reference-runs tables) should
guard against None.
genelab train --eval-every¶
When --eval-every K is set, training runs in chunks of K iterations. After
each chunk the latest checkpoint is loaded into the same backend and a
deterministic eval is run (defaulting to 10 episodes at the same num_envs as
training). When return_mean improves on the prior best, the checkpoint is
copied to <log_dir>/best_model.<ext> and a sibling best_model_meta.json is
updated with the eval payload.
genelab train GeneLab-Inverted-Pendulum-v0 \
--max_iterations 1000 --num_envs 64 --seed 0 \
--eval-every 100 --eval-episodes 16
Caveats:
- Each chunk closes and rebuilds the Genesis env via the backend's normal train
lifecycle. Pick
--eval-every≥ 50 for short tasks so Genesis init time is amortized. - For off-policy algorithms (SAC / TD3 / DDPG via
skrlorsb3), reloading from a checkpoint between chunks loses the replay buffer. Sample efficiency degrades but training still converges. best_model.<ext>reuses the source backend's checkpoint format (.ptforrsl_rl/skrl,.zipforsb3). The metadata file records the source iteration, eval seed, episodes, and return statistics.
genelab export¶
Serializes the actor sub-network to TorchScript or ONNX with per-term
obs scale / clip baked into a single forward(raw_obs) -> actions pass.
Deployment environments need only torch (TorchScript) or an ONNX runtime;
they do not need rsl_rl / skrl / stable_baselines3 at inference time.
# TorchScript
genelab export Genelab-Velocity-Flat-Unitree-G1-v0 logs/.../model_30000.pt \
--format torchscript --out policy.ts
# ONNX (opset 17 by default)
genelab export Genelab-Velocity-Flat-Unitree-G1-v0 logs/.../model_30000.pt \
--format onnx --out policy.onnx --opset 17
Note:
GeneLab-Franka-Pick-And-Place-v0is SAC+HER with a goal-conditionedDictobservation. Its exported model takes a single flatobsthat is the concatenation ofobservation+achieved_goal+desired_goal(in that order); see the multi-group metadata below. Locomotion tasks (cartpole, G1) use a single flat-tensor obs group.
The exporter writes a sibling <output>.metadata.json describing the obs
schema. obs_dim is the total flat-input width; each obs_groups entry records
its start offset into that flat tensor (so goal-conditioned policies can be
sliced back into their sub-spaces):
{
"task": "Genelab-Velocity-Flat-Unitree-G1-v0",
"checkpoint": "logs/.../model_30000.pt",
"obs_dim": 48,
"obs_groups": {
"policy": {
"start": 0,
"dim": 48,
"terms": [
{"name": "joint_pos", "dim": 23, "start": 0, "scale": 1.0, "clip": null},
{"name": "joint_vel", "dim": 23, "start": 23, "scale": 0.1, "clip": [-2, 2]}
]
}
},
"action_dim": 23,
"action_range": [-1.0, 1.0],
"normalization_baked": true,
"is_recurrent": false,
"format": "torchscript",
"exported_at": "2026-05-20T08:42:11+00:00",
"torch_version": "2.4.0"
}
For a SAC+HER task obs_groups has three entries — e.g. observation
(start: 0), achieved_goal (start: 35), desired_goal (start: 38) — and
obs_dim is their sum.
Deployment-side usage¶
import torch
m = torch.jit.load("policy.ts")
m.eval()
# raw obs in (training-side concatenation order); model applies scale/clip itself
actions = m(torch.tensor([[joint_pos_0, joint_pos_1, ..., joint_vel_0, ...]]))
For ONNX:
import onnxruntime as ort
sess = ort.InferenceSession("policy.onnx")
actions = sess.run(None, {"obs": raw_obs.astype("float32")})[0]
What's exported¶
The actor is extracted via a backend-specific shim and wrapped so the call shape is uniform:
rsl_rl: takes the actor module off the algorithm directly (alg._raw_actor, falling back toalg.actor) and uses itsas_jit()export wrapper, which exposes a flatforward(obs) -> deterministic actionwith the learned obs normalizer baked in. Older releases that kept the actor underalg.actor_critic.actor(or onlyact_inference) are still supported.skrl: wrapsagent.policy.actand returns the deterministic mean (themean_actionskey) forGaussianMixinpolicies.sb3: wrapsmodel.policy._predict(obs, deterministic=True), which is uniform across PPO / A2C / SAC / TD3 / DDPG. For goal-conditioned SAC+HER policies the observation space is aDict(observation/achieved_goal/desired_goal) and the SAC actor consumes all keys, so the wrapper takes the flat concatenation of those sub-spaces (in that order) and rebuilds the Dict before calling the policy — the exported model still has a single flatobsinput, and the metadata'sobs_groupsrecords each sub-space'sstart/dimso deployers know the layout.
SB3's ONNX export uses the legacy TorchScript-based exporter (
torch.onnx.export(..., dynamo=False)): thetorch.export-based default (torch ≥ 2.9) can't trace SAC'sNormaldistribution construction.
Recurrent (RNN / LSTM / GRU) policy export¶
Setting rnn_type on an RslRlModelCfg ("lstm" or "gru") trains a recurrent
policy — it is the single knob, automatically selecting rsl_rl's RNNModel:
genelab export then takes the recurrent path automatically (no extra flags). The
metadata gains an "is_recurrent": true field plus a "recurrent" block recording
rnn_type, rnn_num_layers, rnn_hidden_dim, the hidden-state shape, and the ONNX
port names.
The two formats expose the hidden state differently:
- TorchScript keeps the hidden state inside the module, so the call shape stays
the single-input MLP form
forward(obs) -> actions. The module also exposes areset()method — call it at every episode boundary to zero the hidden state. The serialized buffer is fixed at batch size 1 (one deployed robot).
import torch
m = torch.jit.load("policy.ts"); m.eval()
m.reset() # at the start of each episode
actions = m(raw_obs) # raw obs in; hidden state carried internally
- ONNX exposes the hidden state explicitly. Inputs are
obs,h_in(andc_infor LSTM); outputs areactions,h_out(andc_outfor LSTM), each shaped(num_layers, batch, hidden_dim). Thread the returned state back in each step and zero it on episode boundaries:
import numpy as np, onnxruntime as ort
sess = ort.InferenceSession("policy.onnx")
h = np.zeros((num_layers, 1, hidden_dim), np.float32)
c = np.zeros((num_layers, 1, hidden_dim), np.float32) # LSTM only
actions, h, c = sess.run(None, {"obs": raw_obs, "h_in": h, "c_in": c})
# GRU: actions, h = sess.run(None, {"obs": raw_obs, "h_in": h})
Play and eval rollouts reset the hidden state automatically for environments whose episode just ended, so recurrent eval metrics are unbiased.
Limitations¶
- The exported model does not apply observation noise from
ObservationTermCfg.noise; noise is part of training only.