Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TNPG documentation #2162

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ and how to implement new MDPs and new algorithms.
RL2 <user/algo_rl2>
SAC <user/algo_sac>
TD3 <user/algo_td3>
TNPG <user/algo_tnpg>
TRPO <user/algo_trpo>
REINFORCE <user/algo_vpg>

Expand Down
44 changes: 44 additions & 0 deletions docs/user/algo_tnpg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Truncated Natural Policy Gradient (TNPG)

```eval_rst
+-------------------+--------------------------------------------------------------------------------------------------------------+
| **Paper** | Benchmarking Deep Reinforcement Learning for Continuous Control :cite:`duan2016benchmarking`, A Natural |
| | Policy Gradient :cite:`10.5555/2980539.2980738` |
+-------------------+--------------------------------------------------------------------------------------------------------------+
| **Framework(s)** | .. figure:: ./images/tf.png |
| | :scale: 10% |
| | :class: no-scaled-link |
| | |
| | Tensorflow |
+-------------------+--------------------------------------------------------------------------------------------------------------+
| **API Reference** | `garage.tf.algos.TNPG <../_autoapi/garage/tf/algos/index.html#garage.tf.algos.TNPG>`_ |
+-------------------+--------------------------------------------------------------------------------------------------------------+
| **Code** | `garage/tf/algos/tnpg.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/tnpg.py>`_ |
+-------------------+--------------------------------------------------------------------------------------------------------------+
```

```eval_rst
Truncated Natural Policy Gradient develops upon the Natural Policy Gradient, which optimizes a policy for the maximum discounted rewards by gradient descent. TNPG a conjugate gradient algorithm to compute the natural policy gradient, cutting the computation cost when there are high-dimensional parameters. See :cite:`duan2016benchmarking` for more details.
```

## Default Parameters

```py
discount=0.99,
gae_lambda=0.98,
lr_clip_range=0.01,
max_kl_step=0.01,
policy_ent_coeff=0.0,
entropy_method='no_entropy',
```

## References

```eval_rst
.. bibliography:: references.bib
:style: unsrt
:filter: docname in docnames
```
----

*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).*
23 changes: 23 additions & 0 deletions docs/user/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,29 @@ @article{2009koberpolicy
month_numeric = {6}
}

@misc{duan2016benchmarking,
title={Benchmarking Deep Reinforcement Learning for Continuous Control},
author={Yan Duan and Xi Chen and Rein Houthooft and John Schulman and Pieter Abbeel},
year={2016},
eprint={1604.06778},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

@inproceedings{10.5555/2980539.2980738,
author = {Kakade, Sham},
title = {A Natural Policy Gradient},
year = {2001},
publisher = {MIT Press},
address = {Cambridge, MA, USA},
abstract = {We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.},
booktitle = {Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic},
pages = {1531–1538},
numpages = {8},
location = {Vancouver, British Columbia, Canada},
series = {NIPS'01}
}

@misc{finn2017modelagnostic,
title={Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks},
author={Chelsea Finn and Pieter Abbeel and Sergey Levine},
Expand Down