[Bug]: evaluate_policy called multiple times vor vectorized environments #1912

LukasFehring · 2024-04-26T10:46:33Z

🐛 Bug

When calling

from stable_baselines3.common.evaluation import evaluate_policy
def custom_callback(locals, globals):
    pass

evaluate_policy(callback=custom_callback)

with a vecenv, then the callback gets executed for each of the environments separately. However, the locals dict contains the aggregated results. Therefore you have to manually check for which environment the callback was called, or only execute it every n_envs time.

To Reproduce

import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

# Define a simple callback function
def callback(_locals, _globals):
    pass

# Function to create multiple environments
def make_env():
    return gym.make('CartPole-v1')

# Number of environments
num_envs = 4
envs = [make_env for _ in range(num_envs)]

# Create vectorized environment
vec_env = DummyVecEnv(envs)

# Create a model
model = PPO("MlpPolicy", vec_env, verbose=1)

# Train the model
model.learn(total_timesteps=5000)

# Evaluate the policy
mean_reward, std_reward = evaluate_policy(model, vec_env, n_eval_episodes=10, callback=callback)

print("Mean reward:", mean_reward, "STD reward:", std_reward)

Relevant log output / Error message

No response

System Info

OS: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-glibc2.17 # 1 SMP Tue Oct 20 15:39:03 UTC 2020
Python: 3.8.19
Stable-Baselines3: 2.4.0a0
PyTorch: 2.3.0+cu121
GPU Enabled: False
Numpy: 1.24.4
Cloudpickle: 3.0.0
Gymnasium: 0.29.1
OpenAI Gym: 0.23.0

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2024-04-26T11:46:35Z

Hello,
what is your usecase/expected behavior?

the for loop also decompose the info per env:

stable-baselines3/stable_baselines3/common/evaluation.py

Lines 99 to 106 in 35eccaf

 # unpack values so that the callback can access the local variables 

 reward = rewards[i] 

 done = dones[i] 

 info = infos[i] 

 episode_starts[i] = done 

 if callback is not None: 

 callback(locals(), globals())

LukasFehring · 2024-05-07T07:52:57Z

How so? Both the globals and locals contain information on every environment in the vectorized environment. How am I supposed to determine for which env the callback is called?

araffin · 2024-05-07T08:03:31Z

there is the local variable "i"

LukasFehring · 2024-05-07T08:07:03Z

Ah ok sorry then.
A documentation of locals and globals would probably help to find that! :)

araffin · 2024-05-10T13:44:32Z

A documentation of locals and globals would probably help to find that! :)

feel free to open a PR that updates the doc ;)

LukasFehring added the bug Something isn't working label Apr 26, 2024

araffin added documentation Improvements or additions to documentation help wanted Help from contributors is welcomed and removed bug Something isn't working labels May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: evaluate_policy called multiple times vor vectorized environments #1912

[Bug]: evaluate_policy called multiple times vor vectorized environments #1912

LukasFehring commented Apr 26, 2024 •

edited

araffin commented Apr 26, 2024

LukasFehring commented May 7, 2024

araffin commented May 7, 2024

LukasFehring commented May 7, 2024

araffin commented May 10, 2024

[Bug]: evaluate_policy called multiple times vor vectorized environments #1912

[Bug]: evaluate_policy called multiple times vor vectorized environments #1912

Comments

LukasFehring commented Apr 26, 2024 • edited

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

araffin commented Apr 26, 2024

LukasFehring commented May 7, 2024

araffin commented May 7, 2024

LukasFehring commented May 7, 2024

araffin commented May 10, 2024

LukasFehring commented Apr 26, 2024 •

edited