A Response and Response model #61

troiwill · 2024-03-25T04:39:51Z

This pull request is the first stage of implementing the cost-constrained POMCP (CC-POMCP) algorithm. This algorithm cannot be added to the repository directly because it has additional variables and operations not present in the PO-UCT and POMCP algorithms. One example is cost constraints and their corresponding operations.

To accommodate costs and other future variables, I propose to use a generic model, called a Response model, and a corresponding output, called a response. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.

The pull request has the following:

Implementation of the ResponseModel and Response classes in the basics files,
Updates to classes in basics to use ResponseModel instead of RewardModel directly,
Updates to the appropriate pre-existing algorithms, including PO-UCT and POMCP,
Updates to the code for the tiger, rocksample, mos, load_unload, and tag problems,
Added test script for response arithmetic operations in test_response.py,
Passes for the test_all.py script,
Passes for the tiger, rocksample, mos, load_unload, and tag problems.
Comments to the ResponseModel and Response classes.

Simplified the response variable and the model.

zkytony · 2024-03-25T14:05:07Z

pomdp_py/framework/basics.pxd

+
+cdef class Response:
+ cdef float _reward
+


Thanks @troiwill. What is the significance of wrapping a float inside Response? Why does this change need to happen in basic.pyx? This affects all other programs currently using pomdp-py. This would not be acceptable.

zkytony · 2024-03-25T14:41:54Z

pomdp_py/framework/basics.pyx

+ """
+ A Response class that only handles a scalar reward. Subclasses of Response can add
+ more (scalar or vector) variables. But the subclasses must implement how to handle
+ arithmetic and comparison operations.


I see. The idea of vector reward is interesting, and I value the contribution. But currently the changes are not backwards compatible. I suggest the following changes:

Create a file pomdp_py/framework/generalization.pyx in which:

Define a class called ResponseAgent that inherits Agent but it takes in a response model instead of a reward model.

Define Response, ResponseModel and other necessary components there.

Define a sample_generative_function specific for ResponseAgent.

Create a file pomdp_py/algorithms/cc_pomcp.pyx in which you can implement the CC-POMCP algorithm.

That way you don't need to make changes to the existing code using float-based reward.
Please keep the changes to the original code in framework and algorithms as little as possible. In addition, you shouldn't need to modify all the examples to achieve your contribution.

Also, please provide a meaningful example that demonstrates CC-POMCP. Currently the test in test_response.py is rather trivial.

Hey @zkytony, thanks for the notes. Your suggestion is helpful.

Ideally, I would have liked to reuse methods from your PO-UCT and POMCP implementations in the CC-POMCP algorithm. However, I cannot do so via subclassing because PO-UCT and POMCP return a single value in rollout and simulate, while CC-POMCP returns two values.

To avoid a backwards compatibility issue, I could reproduce most of the PO-UCT and POMCP code in CC-POMCP. Please let me know how this approach sounds.

Can you subclass and override rollout and simulate?

@troiwill does that make sense? You can still let CC-POMCP inherit PO-UCT, but override rollout and simulate so they can internally work with multi-valued rewards. You might want to create custom VNode and/or QNode classes, since the value stored in the nodes are also not just floats, I suppose.

Hey @zkytony, that makes sense. Once I've made all the changes, I will create another pull request. Thanks for the note.

Thanks! Will close this PR for now.

troiwill added 5 commits March 24, 2024 13:40

Initial implementation of response and response model.

41b6905

Updated pomdp-py problems to use response model.

6058dbe

Simplified the response variable and the model.

4cdcf60

Merge pull request #1 from troiwill/remove-dict-in-response

42e77f0

Simplified the response variable and the model.

Added comments to the code.

95efc47

zkytony requested changes Mar 25, 2024

View reviewed changes

zkytony reviewed Mar 25, 2024

View reviewed changes

zkytony closed this Apr 3, 2024

troiwill deleted the response-model branch April 19, 2024 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Response and Response model #61

A Response and Response model #61

troiwill commented Mar 25, 2024 •

edited

Loading

zkytony Mar 25, 2024

zkytony Mar 25, 2024 •

edited

Loading

troiwill Mar 27, 2024

zkytony Mar 27, 2024

zkytony Mar 29, 2024

troiwill Apr 3, 2024

zkytony Apr 3, 2024

A Response and Response model #61

A Response and Response model #61

Conversation

troiwill commented Mar 25, 2024 • edited Loading

zkytony Mar 25, 2024

Choose a reason for hiding this comment

zkytony Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

troiwill Mar 27, 2024

Choose a reason for hiding this comment

zkytony Mar 27, 2024

Choose a reason for hiding this comment

zkytony Mar 29, 2024

Choose a reason for hiding this comment

troiwill Apr 3, 2024

Choose a reason for hiding this comment

zkytony Apr 3, 2024

Choose a reason for hiding this comment

troiwill commented Mar 25, 2024 •

edited

Loading

zkytony Mar 25, 2024 •

edited

Loading