| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2019-02-28 | 3.0 kB | |
| v0.6.0.tar.gz | 2019-02-28 | 6.9 MB | |
| v0.6.0.zip | 2019-02-28 | 7.1 MB | |
| Totals: 3 Items | 14.0 MB | 0 | |
Important enhancements
- Implicit Quantile Network (IQN) https://arxiv.org/abs/1806.06923 agent is added:
chainerrl.agents.IQN. - Training DQN and its variants with N-step returns is supported.
- Resetting env with
done=Falseviainfodict is supported. Whenenv.stepreturns ainfodict withinfo['needs_reset']=True, env is reset. This feature is useful for implementing a continuing env. - Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
examples/atari/dqnnow implements the same evaluation protocol as the Nature DQN paper.- An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added:
examples/grasping.
Important bugfixes
- The bug that PPO's
obs_normalizerwas not saved is fixed. - The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
- The bug that
argvargument was ignored bychainerrl.experiments.prepare_output_diris fixed.
Important destructive changes
train_agent_with_evaluationandtrain_agent_batch_with_evaluationnow requireeval_n_steps(number of timesteps for each evaluation phase) andeval_n_episodes(number of episodes for each evaluation phase) to be explicitly specified, with one of them beingNone.train_agent_with_evaluation'smax_episode_lenargument is renamed totrain_max_episode_len.ReplayBuffer.samplenow returns a list of lists of N experiences to support N-step returns.
All updates
Enhancement
- Implicit quantile networks (IQN) (#288)
- Adds N-step learning for DQN-based agents. (#317)
- Replaywarning (#321)
- Close envs in async training (#343)
- Allow envs to send a 'needs_reset' signal (#356)
- Changes variable names in train_agent_with_evaluation (#358)
- Use chainer.dataset.concat_examples in batch_states (#366)
- Implements Time-based evaluations (#367)
Documentation
- Add long description for pypi (#357, thanks @ljvmiranda921!)
- A small change to the installation documentation (#369)
- Adds a link to the ChainerRL visualizer from the main repository (#370)
- adds implicit quantile networks to readme (#393)
- Fix DQN.update's docstring (#394)
Examples
- Grasping example (#371)
- Adds Deepmind Scores to README in DQN Example (#383)
Testing
- Fix
TestTrainAgentAsync(#363) - Use AbnormalExitCodeWarning for nonzero exitcode warnings (#378)
- Avoid random test failures due to asynchronousness (#380)
- Drop hacking (#381)
- Avoid gym 0.11.0 in Travis (#396)
- Stabilize and speed up A3C tests (#401)
- Reduce ACER's test cases and maximum timesteps (#404)
- Add tests of IQN examples (#405)
Bugfixes
- Avoid UnicodeDecodeError in setup.py (#365)
- Save and load obs_normalizer of PPO (#377)
- Make NonbiasWeightDecay work again (#390)
- bug fix (#391, thanks @tappy27!)
- Fix episodic training of DDPG (#399)
- Fix PGT's training (#400)
- Fix ResidualDQN's training (#402)