Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenerativeBeliefMDP isterminal is only true when the support is entirely terminal #544

Open
FlyingWorkshop opened this issue Apr 1, 2024 · 2 comments
Labels
decision enhancement question quick This task shouldn't take too long

Comments

@FlyingWorkshop
Copy link
Member

FlyingWorkshop commented Apr 1, 2024

This isn't an error per say, but isterminal in GenerativeBeliefMDP is only true when the entire support is terminal, but when @gen is called, we sample single state from the belief, meaning isterminal(bmdp, b = false), but @gen(b, a) may fail because a terminal state is sampled. Is there a good reason to define "terminal belief" this way? I agree that checking if any terminal state is in the support might be overly reactive. This becomes a problem, for instance, when we want to generate a bunch of samples and early exit on a "terminal belief."

@FlyingWorkshop FlyingWorkshop added enhancement question decision quick This task shouldn't take too long labels Apr 1, 2024
@FlyingWorkshop FlyingWorkshop closed this as not planned Won't fix, can't repro, duplicate, stale Apr 1, 2024
@zsunberg
Copy link
Member

zsunberg commented Apr 2, 2024

@FlyingWorkshop did you close this because it was resolved? If it can throw an error when it samples a terminal state, we should probably fix that. I think @gen should only sample non-terminal states.

@zsunberg zsunberg reopened this Apr 2, 2024
@FlyingWorkshop
Copy link
Member Author

FlyingWorkshop commented Apr 2, 2024

@zsunberg, I closed it b/c I took a look at the source code again and realized that while isterminal(::GenerativeBeliefMDP, b) is true when all support states are terminal, the @gen block actually checks is the sampled state is terminal:isterminal(bmdp.pomdp, rand(rng, b)). There is an issue w/ sampling terminal states, but I think this is expected behavior b/c there's a overwritable handler for this.

I was testing this on TMaze and there would be an error when we fell back on the default handler b/c reward(::TMaze, ::TerminalState, ::Int) isn't defined. I ended up just implemented the reward function, so don't think this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision enhancement question quick This task shouldn't take too long
Projects
None yet
Development

No branches or pull requests

2 participants