How to Run RL Experiments¶

The practical flow for training and replaying GeneLab tasks with genelab.rl.

1. Smoke-testing before training¶

Run these before any long job:

genelab info TASK_ID
genelab play TASK_ID --agent zero --steps 32
genelab play TASK_ID --agent random --steps 64
genelab train TASK_ID --num_envs 64 --max_iterations 2

This checks registry loading, env construction, action dimensions, observation groups, rewards, terminations, and runner wiring.

2. Picking the policy source for playback¶

Agent	Behavior	Use case
`zero`	Sends zero actions.	Passive physics and reset health checks.
`random`	Sends uniform random actions in `[-1, 1]`.	Action bounds and stability checks.
`trained`	Loads a checkpoint through the task runner.	Inspect trained behavior.

Checkpoint playback:

genelab play TASK_ID \
  --agent trained \
  --checkpoint logs/rsl_rl/<experiment>/<run>/model_300.pt

Passing --checkpoint defaults --agent to trained.

3. Controlling env count deliberately¶

Single process:

genelab train TASK_ID --num_envs 4096

Distributed:

genelab train TASK_ID --gpus 4 --num_envs 4096

With --gpus 4, --num_envs 4096 means 4096 total, 1024 per rank. Use --num_envs_per_gpu to specify the per-rank value directly.

4. Keeping logs reproducible¶

Use meaningful experiment_name and run_name in RslRlOnPolicyRunnerCfg. GeneLab writes:

logs/rsl_rl/<experiment>/<timestamp-or-run>/
├── params/env.json
├── params/agent.json
└── model_*.pt

Use --log_dir PATH only for an exact output directory, such as a distributed relaunch or scripted comparison.

5. Profiling short runs first¶

genelab train TASK_ID \
  --prof \
  --prof-active 3 \
  --prof-repeat 1 \
  --max_iterations 10
genelab prof open logs/torch_profile

Profiler traces grow quickly. Start with a small active window and increase only after confirming the trace size is manageable.

Expected result¶

Every long run should have a passing smoke test, explicit env count semantics, inspectable params/*.json, and a known checkpoint replay command.