Stable Baselines/用户向导/使用自定义环境

本文主要是介绍Stable Baselines/用户向导/使用自定义环境，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Stable Baselines官方文档中文版 Github CSDN
尝试翻译官方文档，水平有限，如有错误万望指正

在自定义环境使用RL baselines，只需要遵循gym接口即可。

也就是说，你的环境必须实现下述方法（并且继承自OpenAI Gym类）：

如果你用图像作为输入，输入值必须在[0,255]因为当用CNN策略时观测会被标准化（除以255让值落在[0,1]）

import gym
from gym import spacesclass CustomEnv(gym.Env):"""Custom Environment that follows gym interface"""metadata = {'render.modes': ['human']}def __init__(self, arg1, arg2, ...):super(CustomEnv, self).__init__()# Define action and observation space# They must be gym.spaces objects# Example when using discrete actions:self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)# Example for using image as input:self.observation_space = spaces.Box(low=0, high=255,shape=(HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8)def step(self, action):...def reset(self):...def render(self, mode='human', close=False):...

然后你就可以用其训练一个RL智体：

# Instantiate and wrap the env
env = DummyVecEnv([lambda: CustomEnv(arg1, ...)])
# Define and Train the agent
model = A2C(CnnPolicy, env).learn(total_timesteps=1000)