Skip to content

How to Run RL Experiments

The practical flow for training and replaying GeneLab tasks with genelab.rl.

1. Smoke-testing before training

Run these before any long job:

genelab info TASK_ID
genelab play TASK_ID --agent zero --steps 32
genelab play TASK_ID --agent random --steps 64
genelab train TASK_ID --num_envs 64 --max_iterations 2

This checks registry loading, env construction, action dimensions, observation groups, rewards, terminations, and runner wiring.

2. Picking the policy source for playback

Agent Behavior Use case
zero Sends zero actions. Passive physics and reset health checks.
random Sends uniform random actions in [-1, 1]. Action bounds and stability checks.
trained Loads a checkpoint through the task runner. Inspect trained behavior.

Checkpoint playback:

genelab play TASK_ID \
  --agent trained \
  --checkpoint logs/rsl_rl/<experiment>/<run>/model_300.pt

Passing --checkpoint defaults --agent to trained.

3. Controlling env count deliberately

Single process:

genelab train TASK_ID --num_envs 4096

Distributed:

genelab train TASK_ID --gpus 4 --num_envs 4096

With --gpus 4, --num_envs 4096 means 4096 total, 1024 per rank. Use --num_envs_per_gpu to specify the per-rank value directly.

4. Keeping logs reproducible

Use meaningful experiment_name and run_name in RslRlOnPolicyRunnerCfg. GeneLab writes:

logs/rsl_rl/<experiment>/<timestamp-or-run>/
├── params/env.json
├── params/agent.json
└── model_*.pt

Use --log_dir PATH only for an exact output directory, such as a distributed relaunch or scripted comparison.

5. Profiling short runs first

genelab train TASK_ID \
  --prof \
  --prof-active 3 \
  --prof-repeat 1 \
  --max_iterations 10
genelab prof open logs/torch_profile

Profiler traces grow quickly. Start with a small active window and increase only after confirming the trace size is manageable.

Expected result

Every long run should have a passing smoke test, explicit env count semantics, inspectable params/*.json, and a known checkpoint replay command.