Skip to content

Latest commit

 

History

History
317 lines (220 loc) · 8.81 KB

README-en.md

File metadata and controls

317 lines (220 loc) · 8.81 KB

(latest) release | GitHub

Simple Distributed Reinforcement Learning FrameWork (SRL)

I am creating a simple distributed reinforcement learning framework.

It has the following features:

  • Support for distributed reinforcement learning
  • Automatic adjustment of the interface between the environment and the algorithm
  • Support for Gym/Gymnasium environments
  • Provides customizable environment classes
  • Provides customizable reinforcement learning algorithm classes
  • Provides well-known reinforcement learning algorithms
  • (Support for new algorithms)

Document

https://pocokhc.github.io/simple_distributed_rl/

Algorithm explanation article (Qiita)

The article will be explained in Japanese.

https://qiita.com/pocokhc/items/a2f1ba993c79fdbd4b4d

1. Install

1-1.Related libraries

Required library

pip install numpy

Option Libraries

Depending on the functions you use, the following libraries are required.

  • If you use an algorithm that requires Tensorflow
    • tensorflow
    • tensorflow-probability
  • If you are using an algorithm that requires Torch
  • When using image-related functions
    • pillow
    • opencv-python
    • pygame
  • When using history statistics
    • pandas
    • matplotlib
  • When using the OpenAI Gym environment
    • gym or gymnasium
    • pygame
  • To view hardware statistics
    • psutil
    • pynvml
  • When using cloud/network distributed learning
    • redis
    • pika
    • paho-mqtt
  • To manage your learning
    • mlflow

The bulk installation command is as follows. (Excluding Tensorflow, Torch, cloud distributed learning libraries, and MLFlow)

pip install matplotlib pillow opencv-python pygame pandas gymnasium psutil pynvml

1-2.Install/Download

This framework can be installed or downloaded from GitHub.

Install

pip install git+https://github.com/pocokhc/simple_distributed_rl

or

git clone https://github.com/pocokhc/simple_distributed_rl.git
cd simple_distributed_rl
pip install .

Download

If the execution path is set to the srl directory, you can simply download it.

# Download SRL files
git clone https://github.com/pocokhc/simple_distributed_rl.git
# SRL import example
import os
import sys

assert os.path.isdir("./simple_distributed_rl/srl/")  # Where to download SRL
sys.path.insert(0, "./simple_distributed_rl/")

import srl
print(srl.__version__)

2. Usage

Here's a simple usage example.

import srl
from srl.algorithms import ql  # Importing ql algorithms


def main():
    # Creating a runner
    runner = srl.Runner("Grid", ql.Config())

    # train
    runner.train(timeout=10)

    # Evaluation of training results
    rewards = runner.evaluate()
    print(f"evaluate episodes: {rewards}")

    # --- Visualization example
    #  ("pip install opencv-python pillow pygame" is required to run animation_save_gif)
    runner.animation_save_gif("Grid.gif")


if __name__ == "__main__":
    main()

Grid.gif

For other usage, please see the following documentation.

(Trial implementation) Learning management using MLFlow

There is an example in examples/sample_mlflow.py.

3. Framework Overview

  • Sequence flow

overview-sequence.drawio.png

  • Distributed flow

overview-mp.drawio.png

  • Simplified pseudo code(train)

For implementation details, see 'Making a Custom algorithm'.

# Initialize learning units
env.setup()
worker.setup()
trainer.setup()

for episode in range(N)
  # 1 episode initializing phase
  env.reset()
  worker.reset()

  # 1 episode loop
  while not env.done:
      env.render()  # env render

      # get action
      worker.ready_policy()
      action = worker.policy()
      worker.render()  # worker render

      # Run 1 step of the environment
      env.step(action)
      worker.on_step()

      # Train phase
      trainer.train()
  # Drawing after the end of one episode
  env.render()
  worker.render()

# End of learning unit
env.teardown()
worker.teardown()
trainer.teardown()

4. Custom environment and algorithms

Please refer to the following documents for creating original environments and algorithms.
(The contents are in Japanese)

5. Algorithms

ModelFree

ValueBase

Algorithm Observation Action Tensorflow Torch ProgressRate
QL Discrete Discrete - - 100% Basic Q Learning
DQN Continuous Discrete 100%
C51 Continuous Discrete - 99% CategoricalDQN
Rainbow Continuous Discrete 100%
R2D2 Continuous Discrete - 100%
Agent57 Continuous Discrete 100%
SND Continuous Discrete - 100%
Go-Explore Continuous Discrete - 100% DQN base, R2D3 memory base

PolicyBase/ActorCritic

Algorithm Observation Action Tensorflow Torch ProgressRate
VanillaPolicy Discrete Both - - 100%
A3C/A2C - - - - -
TRPO - - - - -
PPO Continuous Both - 100%
DDPG/TD3 Continuous Continuous - 100%
SAC Continuous Both - 100%

AlphaSeries

Algorithm Observation Action Tensorflow Torch ProgressRate
MCTS Discrete Discrete - - 100% MDP base
AlphaZero Image Discrete - 100% MDP base
MuZero Image Discrete - 100% MDP base
StochasticMuZero Image Discrete - 100% MDP base

ModelBase

Algorithm Observation Action Framework ProgressRate
DynaQ Discrete Discrete - 100%

WorldModels

Algorithm Observation Action Tensorflow Torch ProgressRate
WorldModels Continuous Discrete - 100%
PlaNet Continuous Discrete ✔(+tensorflow-probability) - 100%
Dreamer Continuous Both - - merge DreamerV3
DreamerV2 Continuous Both - - merge DreamerV3
DreamerV3 Continuous Both ✔(+tensorflow-probability) - 100%

Offline

Algorithm Observation Action Framework ProgressRate
CQL Discrete Discrete 0%

Original

Algorithm Observation Action Type Tensorflow Torch ProgressRate
QL_agent57 Discrete Discrete ValueBase - - 80% QL + Agent57
Agent57_light Continuous Discrete ValueBase 100% Agent57 - (LSTM,MultiStep)
SearchDynaQ Discrete Discrete ModelBase - - 100% original
GoDynaQ Discrete Discrete ModelBase - - 99% original
GoDQN Continuous Discrete ValueBase - 90% original

6. Online Distributed Learning

For information on distributed learning over a network, please refer to the following documents. (The contents are in Japanese)

For information on linking with cloud services, please refer to the Qiita article. (The contents are in Japanese)

7. Development environment

Look dockers folder

  • PC1
    • windows11
    • CPUx1: Core i7-8700 3.2GHz
    • GPUx1: NVIDIA GeForce GTX 1060 3GB
    • memory 48GB
  • PC2
    • windows11
    • CPUx1: Core i9-12900 2.4GHz
    • GPUx1: NVIDIA GeForce RTX 3060 12GB
    • memory 32GB