| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2022-10-03 | 7.0 kB | |
| v1.0.0b2 JAX Support and Hyperparameter Tuning.tar.gz | 2022-10-03 | 64.8 MB | |
| v1.0.0b2 JAX Support and Hyperparameter Tuning.zip | 2022-10-03 | 65.0 MB | |
| Totals: 3 Items | 129.8 MB | 0 | |
🎉 I am thrilled to announce the v1.0.0b2 CleanRL Beta Release. This new release comes with exciting new features. First, we now support JAX-based learning algorithms, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

Also, we now have preliminary support for hyperparameter tuning via optuna (see docs), which is designed to help researchers to find a single set of hyperparameters that work well with a kind of games. The current API looks like below:
:::python
import optuna
from cleanrl_utils.tuner import Tuner
tuner = Tuner(
script="cleanrl/ppo.py",
metric="charts/episodic_return",
metric_last_n_average_window=50,
direction="maximize",
aggregation_type="average",
target_scores={
"CartPole-v1": [0, 500],
"Acrobot-v1": [-500, 0],
},
params_fn=lambda trial: {
"learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
"num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
"update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4, 8]),
"num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
"vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
"max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
"total-timesteps": 100000,
"num-envs": 16,
},
pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
sampler=optuna.samplers.TPESampler(),
)
tuner.tune(
num_trials=100,
num_seeds=3,
)
Besides, we added support for new algorithms/environments, which are
- Isaac Gym support in PPO for GPU accelerated robotics environment.
ppo_continuous_action_isaacgym.py - Random Network Distillation (RND) for highly exploratory environments:
ppo_rnd_envpool.py
I would like to cordially thank the core dev members @dosssman @yooceii @Dipamc @kinalmehta for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @cool-RR, @Howuhh, @jseppanen, @joaogui1, @kinalmehta, and @ALPH2H.
New CleanRL Supported Publications
Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, & Shuicheng YAN (2022). EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=BubxnHpuMbG
New Features PR
- prototype jax with ddpg by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/187
- Isaac Gym Envs PPO updates by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/233
- JAX TD3 prototype by @joaogui1 in https://github.com/vwxyzjn/cleanrl/pull/225
- prototype jax with dqn by @kinalmehta in https://github.com/vwxyzjn/cleanrl/pull/222
- Poetry 1.2 by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/271
- Add rnd_ppo.py documentation and refactor by @yooceii in https://github.com/vwxyzjn/cleanrl/pull/151
- Hyperparameter optimization by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/228
- Update the hyperparameter optimization example script by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/268
Bug Fixes PR
- Td3 ddpg action bound fix by @dosssman in https://github.com/vwxyzjn/cleanrl/pull/211
- added gamma to reward normalization wrappers by @Howuhh in https://github.com/vwxyzjn/cleanrl/pull/209
- Seed envpool environment explicitly by @jseppanen in https://github.com/vwxyzjn/cleanrl/pull/238
- Fix PPO + Isaac Gym Benchmark Script by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/243
- Fix for noise sampling for the TD3 exploration by @dosssman in https://github.com/vwxyzjn/cleanrl/pull/260
Documentation PR
- Add a note on PPG's performance by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/199
- Clarify CleanRL is a non-modular library by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/200
- Fix documentation link by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/213
- JAX + DDPG docs fix by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/229
- Fix links in docs for
ppo_continuous_action_isaacgym.pyby @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/242 - Fix docs (badge, TD3 + JAX, and DQN + JAX) by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/246
- Fix typos by @ALPH2H in https://github.com/vwxyzjn/cleanrl/pull/282
- Fix docs links in README.md by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/254
- chore: remove unused parameters in jax implementations by @kinalmehta in https://github.com/vwxyzjn/cleanrl/pull/264
Misc PR
- Show correct exception cause by @cool-RR in https://github.com/vwxyzjn/cleanrl/pull/205
- Remove pettingzoo's pistonball example by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/214
- Leverage CI to speed up poetry lock by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/235
- Ubuntu runner for poetry lock by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/236
- Remove the github pages CI in favor of vercel by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/241
- Clarify LICENSE info by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/253
- Update published paper citation by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/284
- Refactor dqn word choice by @vwxyzjn in https://github.com/vwxyzjn/cleanrl/pull/257
New Contributors
- @cool-RR made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/205
- @Howuhh made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/209
- @jseppanen made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/238
- @joaogui1 made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/225
- @kinalmehta made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/222
- @ALPH2H made their first contribution in https://github.com/vwxyzjn/cleanrl/pull/282
Full Changelog: https://github.com/vwxyzjn/cleanrl/compare/v1.0.0b1...v1.0.0b2