Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix in priority of put!/get of a resource #101

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jvkerckh
Copy link
Contributor

@jvkerckh jvkerckh commented Oct 6, 2023

  • put!/get (lock/unlock) requests for a resource are now handled in descending priority order, the same way as simulation events.

* put!/get (lock/unlock) requests for a resource are now handled in descending priority order, the same way as simulation events.
@Krastanov
Copy link
Member

Is this the appropriate way to fix this? I am wondering because I imagine someone might be confused that isless does not correspond to a.priority < b.priority anymore? Would it be clearer to fix this by changing the sort order in the consumer of isless?

@jvkerckh
Copy link
Contributor Author

jvkerckh commented Oct 8, 2023

I understand the concern here. What I did was take a look at the isless function for the AbstractEvent type and simplified it, which appeared to be the simplest way of fixing this.

If preferred, I could implement it the way you suggest instead.

@Krastanov
Copy link
Member

I think we indeed should avoid changing the meaning of isless, as that might be a breaking change for some users and it would require us to release version 2.0.

Do you mind adding a test that fails on the current master branch, showcasing what the bug is? I think it would be easier for me to comment after I see that.

jvkerckh added a commit that referenced this pull request Oct 10, 2023
* Added a test that checks if put!/get requests are handled in order of descending priority, as requested by @Krastanov in the discussion of PR #101. This test will currently fail!
@jvkerckh
Copy link
Contributor Author

Test added to test_resource_priorities.jl.

In the test, ev1 and ev2 are unlock requests for resource res, with the priority of ev1 higher than ev2. Hence, as soon as the resource can be unlocked, ev1 should trigger when the simulation runs, and the tests checks for this.

However, the test fails since ev2 is triggered first despite having the lower priority.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2023

Codecov Report

Merging #101 (45a41c7) into master (34ed8dd) will decrease coverage by 1.16%.
Report is 1 commits behind head on master.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master     #101      +/-   ##
==========================================
- Coverage   97.15%   96.00%   -1.16%     
==========================================
  Files          12       12              
  Lines         316      325       +9     
==========================================
+ Hits          307      312       +5     
- Misses          9       13       +4     
Files Coverage Δ
src/base.jl 98.03% <100.00%> (+0.12%) ⬆️
src/resources/base.jl 87.50% <100.00%> (-5.36%) ⬇️
src/resources/containers.jl 100.00% <100.00%> (ø)
src/resources/ordered_stores.jl 100.00% <100.00%> (ø)
src/resources/stores.jl 97.72% <100.00%> (ø)
src/resources/delayed_stores.jl 58.33% <50.00%> (ø)
src/simulations.jl 92.85% <60.00%> (-7.15%) ⬇️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

* Changed the implementation of the bugfix to priority handling for put!/get requests to avoid changing the meaning of `isless` as per the request of @Krastanov in PR JuliaDynamics#101.
@jvkerckh
Copy link
Contributor Author

Also changed the isless function to maintain the intuitively understood meaning of the function. Fortunately the only additional change that was needed was in the trigger_put and trigger_get functions.

@Krastanov
Copy link
Member

Apologies for my slowness in reviewing this, I have a few urgent responsibilities to deal with until Oct 28th -- I will review promptly afterwards

@Krastanov
Copy link
Member

Did you mean to merge the test on master? It seems there is already a change on the master branch related to this that has made the CI badge go red.

@jvkerckh
Copy link
Contributor Author

Did you mean to merge the test on master? It seems there is already a change on the master branch related to this that has made the CI badge go red.

I thought you meant that when you asked me "Do you mind adding a test that fails on the current master branch, showcasing what the bug is? I think it would be easier for me to comment after I see that."

If not, my apologies for misunderstanding.

@Krastanov
Copy link
Member

My bad as well, I should have been clearer. I usually aim to have the master branch always green (it makes git bisect easier and it avoids other issues). I meant adding the test in this branch so it is isolated from the master branch but still visible in the PR CI.

@jvkerckh
Copy link
Contributor Author

Any way to perform a roll back on that accidental/undesired change?

@Krastanov
Copy link
Member

No need, it is minor and this PR will be merged soon enough, taking care of it.

@Krastanov
Copy link
Member

Got around to actually reviewing and looking up the documentation of PriorityQueue and the notion of priority documented in this package, and it is actually a bit underdefined and I am getting confused. I could not find a piece of documentation or test code that actually imposes what is the "correct" priority ordering and how to use it. The word priority is mentioned only once in the documentation and not mentioned in any docstrings.

I have a couple of questions.

  1. Is it true that there is no public API for setting the priority of simulation events? If there is one, could you post it here?
  2. All get/set events on Resources do have a priority and that one can be set with the priority keyword, right? This pull request is about ensuring that the order in which they are executed is descending in priority, and that is done for consistency with the order in which simulation events are executed, right?
  3. Both the current version of the PR and the first version of the PR should be considered breaking changes, because people's code out there might already depend on the current "increasing in priority order" convention, right?

If in point 1 I am correct that there is no public API for simulation event priority (I am not sure of that), should we instead make the change there to ensure that we do not have "breaking change" issues as discussed in point 3?

If I am correct about point 1, maybe we should also create a public API for simulation event priorities?

@Krastanov
Copy link
Member

Here is a toy I set up to try to understand how priorities for resources work:

on master

julia> sim = Simulation(); res = Resource(sim); global_log = [];

julia> @resumable function lock_and_log_in_global(sim, res, priority, msg)
           @yield lock(res, priority=priority)
           push!(global_log, msg)
           unlock(res)
       end;

julia> @process lock_and_log_in_global(sim, res, 2, "priority 2")
Process 1

julia> @process lock_and_log_in_global(sim, res, 0, "priority 0")
Process 3

julia> @process lock_and_log_in_global(sim, res, 1, "priority 1")
Process 5

julia> @process lock_and_log_in_global(sim, res, 1, "priority 1 again")
Process 7

julia> @process lock_and_log_in_global(sim, res, 0, "priority 0 again")
Process 9

julia> @process lock_and_log_in_global(sim, res, 2, "priority 2 again")
Process 11

julia> run(sim)

julia> global_log
6-element Vector{Any}:
 "priority 2"
 "priority 0"
 "priority 0 again"
 "priority 1"
 "priority 1 again"
 "priority 2 again"

The very first event seems wrong, but the other events seem correct, assuming ordering by ascending priority. Did I do something wrong here?

on this PR

julia> @process lock_and_log_in_global(sim, res, 0, "priority 0")
Process 1

julia> @process lock_and_log_in_global(sim, res, 2, "priority 2")
Process 3

julia> @process lock_and_log_in_global(sim, res, 1, "priority 1")
Process 5

julia> @process lock_and_log_in_global(sim, res, 1, "priority 1 again")
Process 7

julia> @process lock_and_log_in_global(sim, res, 0, "priority 0 again")
Process 9

julia> @process lock_and_log_in_global(sim, res, 2, "priority 2 again")
Process 11

julia> run(sim)

julia> global_log
6-element Vector{Any}:
 "priority 0"
 "priority 2"
 "priority 2 again"
 "priority 1"
 "priority 1 again"
 "priority 0 again"

Here we have the same issue: the first event seems wrong, but then the ordering is correct, assuming sorting by decreasing priority.

Conclusion

I think there might be an unrelated problem with the priorities for resources (the first event being mistreated), but I think we should keep the ascending order in the resources priorities, because otherwise there would be a breaking change.

If there is a public API for simulation event priorities, the way there is one for resources priorities, indeed we should make the two APIs match, and that is an unavoidable breaking change. But if there is no public API for simulation event priorities, maybe we can fix this mismatch in orders by changing the non-public one.

Apologies for not investigating this earlier, before the previous round for discussions and prompting you to modify the PR -- I have not used the priorities functionality before and did not suspect how much I am missing.

@jvkerckh
Copy link
Contributor Author

I'll try to answer both your posts in one go.

Answers to questions

Question 1

Is it true that there is no public API for setting the priority of simulation events? If there is one, could you post it here?

That's correct, there is no public API for this. As far as I understand the code, this is because when an event is defined, it is immediately inserted into the priority queue of the simulation for simulation events, or in the appropriate priority queue of the resource for put!/lock and get/unlock events.

Question 2

All get/set events on Resources do have a priority and that one can be set with the priority keyword, right? This pull request is about ensuring that the order in which they are executed is descending in priority, and that is done for consistency with the order in which simulation events are executed, right?

Correct on both counts. The priority keyword argument in the put!/get (lock/unlock) is normally 0, but can be set to whatever number desired within the type permitted for the specific resource of course (usually Int).

A simulation event's priority is determined based on the following three criteria:

  1. Time: events scheduled for an earlier simulation time are executed before events scheduled for a later execution time.
  2. Priority: for two events scheduled for the same simulation time, the event with the higher priority is executed before the event with the lower priority.
  3. ID: for two events scheduled for the same simulation time AND with the same priority, the event with the lower ID (the event that was scheduled earlier in the code) is executed before the event with the higher ID.

Since resource put!/get events don't have a time associated with them, only steps 2 and 3 need to be checked. This is the reason why the first iteration of this PR was the way it was.

Question 3

Both the current version of the PR and the first version of the PR should be considered breaking changes, because people's code out there might already depend on the current "increasing in priority order" convention, right?

Indeed. A very good point, which I had failed to consider.

Question 4

If I am correct about point 1, maybe we should also create a public API for simulation event priorities?

Given that the PR as it is now will introduce potential breaking changes, this might indeed be desirable. In particular, I would suggest the following:

  1. An optional keyword argument for Simulation that affects how simulation event priorities are handled (highest or lowest first). The default setting will be highest first. This will require the definition of a second non-public ordering function, and maybe rename the existing one for clarity instead of (ab)using isless.

  2. An optional keyword argument for all resource objects that does the same, but where the default setting will be lowest priority first to avoid breaking existing code of users.

  3. A public method setpriority! that acts on an event. I believe this should be doable because of the way events are internally stored in the corresponding priority queue, and the event's priority is part of the EventKey which determines the event's place in that priority queue.

More changes might be needed to ensure 100% backwards compatibility, so this list of proposed changes isn't exhaustive.

Uncovered priority issue

Your toy problem revealed an issue that I haven't seen before, and it's a real puzzler.

Also, no need to apologise for not investigating this earlier. If anyone should've noticed, it should've been me since I've been using SimJulia/ConcurrentSim for close to 7 years now, though never with resource put!/get priorities.

Summary

Given the fact that my proposed changes are breaking, I agree to not incorporate this PR in the master branch as it is now. Because of that, it might be wise to roll back the currently failing test that I (unwisely) included in the master branch.

I'll definitely take another look at how resource events are scheduled to really understand what is going on there since it honestly makes no sense whatsoever at first glance.

If desired, I can also look into the changes proposed in my answer to your 4th question.

@jvkerckh
Copy link
Contributor Author

I've taken an in-depth look at what is happening in the example that you posted.

After initialising the simulation and assigning the processes, but before run(sim), the six processes get scheduled. As such, the event queue (field heap) for sim will contain 6 events:

julia> sim.heap
DataStructures.PriorityQueue{ConcurrentSim.BaseEvent, ConcurrentSim.EventKey, Base.Order.ForwardOrdering} with 6 entries:
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000002, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000001)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000004, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000002)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000006, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000003)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000008, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000004)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x000000000000000a, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000005)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x000000000000000c, Function[#1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000006)

These events correspond to the processes as entered. Checking the resource shows that it has been initialised as it should be.

julia> res.level
0

The first step of the simulation runs Process 1, that is @process lock_and_log_in_global(sim, res, 2, "priority 2") until the @yield lock(res, priority=priority) statement. The key to what is happening lies in what that statement does exactly. It executes the function lock(res, priority=priority) which returns an event, and yields execution of the process until that event is processed by the simulation.

Using ConcurrentSim.step(sim), the simulation can be advanced one step/event (this method probably isn't exported because it has a very different interpretation in Base). After doing so, the event queue looks like

DataStructures.PriorityQueue{ConcurrentSim.BaseEvent, ConcurrentSim.EventKey, Base.Order.ForwardOrdering} with 6 entries:
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000004, Function[#1], scheduled, nothing)     => EventKey(0.0, 0, 0x0000000000000002)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000006, Function[#1], scheduled, nothing)     => EventKey(0.0, 0, 0x0000000000000003)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x0000000000000008, Function[#1], scheduled, nothing)     => EventKey(0.0, 0, 0x0000000000000004)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x000000000000000a, Function[#1], scheduled, nothing)     => EventKey(0.0, 0, 0x0000000000000005)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x000000000000000c, Function[#1], scheduled, nothing)     => EventKey(0.0, 0, 0x0000000000000006)
  BaseEvent(Simulation time: 0.0 active_process: nothing, 0x000000000000000d, Function[#1, #1], scheduled, nothing) => EventKey(0.0, 0, 0x0000000000000007)

which is as expected: the first event has been handled, and the Put event generated by lock has entered the (rear of the) queue. Examining the resource reveals that the resource has already been allocated.

julia> res.level
1

This is expected behaviour due to how put! and get (lock and unlock) work

  1. These functions generate the appropriate event, Put or Get, and enter it into the correct resource queue.
  2. They generate the necessary callbacks for handling get or put requests that are in the respective resource queues. These callbacks are attached to the event generated in step 1.
  3. They check the queue of put or get requests and process all requests that can be met, stopping at the first that cannot be met. If a request can be met, the event generated in step 1 is put in the simulation queue, and the level of the resource is adjusted.
  4. They return the event generated in step 1.

Linking back to the example, the lock request from Process 1 can be served immediately as per step 3 of the lock request, so that is what happens before the process yields execution to the next scheduled event, which is the reason for the observed anomaly.

If there's anything about this that is unclear still, please let me know.

@Krastanov
Copy link
Member

Thank you for the detailed explanation, this now makes a lot of sense! I will not be able to answer the various planning questions until next week, but in the meantime, I vote we keep the test you had added and just change it to @test_broken -- it is a valid test, so we can just as well mark it as a known broken test.

@jvkerckh
Copy link
Contributor Author

jvkerckh commented Oct 19, 2023

Alright. Will you make the change to the test, or shall I?

* Added an optional keyword argument, `highpriofirst` to Simulation, Container, Store, and their derivatives, so the user can select how they wish priorities to be handled (`true` for high to low, `false` otherwise).
* Default for Simulation is `true`, default for Container and Store is `false` so as not to break existing code.
* Added the optional keyword argument to all methods and Resources derived from base Container and Store.
* Added a line about the keyword argument in the docstring of Container and Store.
@jvkerckh
Copy link
Contributor Author

I've made the first two changes that I suggested a few posts back. A detailed overview of what's changed:

  • Added two new ordering schemes HighPrioFirst and LowPrioFirst as subtypes of Base.Ordering with shortcuts HighPrio and LowPrio for instances of these schemes, and a function pickorder that takes a Bool argument and returns HighPrio/LowPrio on true/false. It is probably useful to export these. Thoughts?
  • Added a keyword argument highpriofirst (default true) to the constructor of Simulation.
  • Added a keyword argument highpriofirst (default false) to the constructors of Container and Store, and their derived methods (I forgot about the derived methods the first time 'round).
  • Added Base.lt methods for EventKey and ResourceKey for both ordering schemes. The isless implementation for ResourceKey is left in for posterity in case users want to define a custom resource without using the ordering schemes above.

@Krastanov
Copy link
Member

Thanks! I will plan to provide a review early next week.

@Krastanov
Copy link
Member

Apologies, this fell off my radar.

This looks very near good to merge, but I have a few questions. Before merging though, could you share your thoughts on the following:

Resource priorities have had a public API for a while and it is highpriorityfirst=false. This is adding the possibility for Resources highpriorityfirst=true and it adds a couple of tests.

However, Simulation did not have a public API for priority ordering. This is making such an API public and it is defaulting to highpriorityfirst=true as a default. There is a possibility for highpriority=false now, but there are no tests for that.

Given the lack of testing, I would suggest one of the following changes:

  • keep Simulation with highpriorityfirst=true but throw an error("Simulations do not support the requested event priority order yet.")
  • just remove that flag from the Simulation constructor altogether (I prefer that option as it gives us more flexibility in the future)

My reasoning is that I am scared of supporting functionality that does not have extremely thorough tests, especially one so basic (I would not be surprised if the ordering is used implicitly somewhere deep inside the internals of the library, leading to breackage).

I think it makes sense to export the priority orderings you have defined.

@Krastanov
Copy link
Member

Hi, Johan! I will convert this back to draft, just to make it easier for me to track what I need to be reviewing among the handful of repos I work on. Please do not hesitate to convert it back to non-draft.

@Krastanov Krastanov marked this pull request as draft June 10, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants