Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SlogLogger emit data config #349

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

itrumper
Copy link
Contributor

@itrumper itrumper commented May 9, 2023

I found myself wanting to be able to customize what data the SlogLogger observer was logging, so I attempted to implement this functionality via the builder pattern as shown in the doc example in the data method on SlogLogger.

To do this, I needed to add the Debug trait as a trait bound to the type Param associated types, hence why this PR touches so many files. This could cause breaking changes for users.

Since this uses a builder style pattern to define what data is logged, the default contains the same data as what was previously hard coded, so end users should not experience any breaking changes in their observer configurations. To customize the data that is logged, a new enum called StateData has been defined in the state/mod.rs file that covers all the available data you can get from the State trait. A user supplies a Vec of these enum variants and then the logger iterates through and emits them. The Debug trait was needed because I wanted to print out the Params as well, which are generics.

The motivation for doing this was to actually be able to log the gradient of a problem as well, but since not every solver has a defined gradient, I'm less sure that can be done. I will need assistance in evaluating that possibility, since it would significantly help with debugging!

Let me know what you think! Thanks for building a cool crate :)

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 61.62% and project coverage change: -0.15 ⚠️

Comparison is base (ae98919) 93.35% compared to head (39b832c) 93.21%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
- Coverage   93.35%   93.21%   -0.15%     
==========================================
  Files         117      117              
  Lines       18803    18881      +78     
==========================================
+ Hits        17554    17599      +45     
- Misses       1249     1282      +33     
Impacted Files Coverage Δ
argmin/src/core/state/iterstate.rs 99.90% <ø> (ø)
argmin/src/core/state/linearprogramstate.rs 98.24% <ø> (ø)
argmin/src/core/state/populationstate.rs 97.53% <ø> (ø)
argmin/src/solver/conjugategradient/cg.rs 99.16% <ø> (ø)
...rgmin/src/solver/conjugategradient/nonlinear_cg.rs 98.25% <ø> (ø)
...n/src/solver/gaussnewton/gaussnewton_linesearch.rs 92.69% <ø> (ø)
...rgmin/src/solver/gaussnewton/gaussnewton_method.rs 96.51% <ø> (ø)
...gmin/src/solver/gradientdescent/steepestdescent.rs 95.04% <ø> (ø)
argmin/src/solver/landweber/mod.rs 98.33% <ø> (ø)
argmin/src/solver/linesearch/backtracking.rs 99.05% <ø> (ø)
... and 18 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@stefan-k
Copy link
Member

Thanks a lot for this work! I apologize again to you for the very late response. Unfortunately you chose the worst possible time to issue a PR this big as I'm currently extraordinarily busy ;) Therefore I'm unfortunately unable to give a detailed review right know, but I do want to give you a bit of feedback before you invest more time into this PR.

Generally, I'm very happy to see filtering implemented in the SlogLogger; however, I'm not convinced that this approach is the way to go. A few points:

  • Having an enum to represent the possible metrics is limiting, because each solver can define their own metrics sent to the observer interface. In order to be as flexible as possible, I think there is no way to get past string comparisons.
  • A Debug bound on Param shouldn't be necessary (I think). I would have to look at it in detail, but the observer itself could enforce this bound without it being used all over the place in the core code. As an example you could have a look at file checkpointing, where a Serialize bound is only required when checkpointing is actually used. I'm however not sure how well this also applies to the observer case.
  • I don't think state is the appropriate place to store which fields to observe. As this is observer specific, I think it should be possible to only change the observer code.

I had the need for filtering also in a new observer I am working on in #311. I chose to collect a list of strings and then only select the ones listed (see line 148 in this file). Note that this code essentially does something quite different to what you want to achieve, but I think the approach should be similar.

Regarding observing parameters/gradients and so on: I would love to see this, but maybe it would be easier to implement this in a dedicated observer? I'm not sure though, just an idea to think about :)

Thanks again for this contribution, I really appreciate it! I hope that not too much of your work needs to be changed and I hope that I will be able to give you a detailed review soon, but unfortunately I cannot promise it right now.

@itrumper
Copy link
Contributor Author

Thanks for taking the time to look it over and provide feedback! That is a great help, and thanks for letting me know about your availability. I don't have expectations of a timeline, so please don't feel pressured to work on this.

In response to your points:

  1. I see that point, we need a flexible enough interface to accommodate Solvers different parameters. However, I want to try to find a way that implements better type safety than string comparisons, but still is extensible. I don't know what that looks like right now, so I'll think about it more.
  2. I will certainly look into the file checkpointing! Seems like you're describing a trait bound behind a feature gate?
  3. I may have miscommunicated about this point. Which data to log isn't stored in State but rather the Observer itself. I only modified that file to add the enum that defines what "names" of data are available.

I'll check out the observer you are working on, thanks for the references!

For the solver specific data, like parameters and gradients, I do think it would take a different type of observer, as right now the Observer doesn't know which parameters are available, just what is exposed through the State trait, correct?

I'm happy to completely redo it all if a better solution is out there! You know your crate, and how pieces fit together better than I do, so I'll happily take your direction. Thanks for the discussion.

@stefan-k
Copy link
Member

stefan-k commented Nov 4, 2023

1. I see that point, we need a flexible enough interface to accommodate `Solver`s different parameters. However, I want to try to find a way that implements better type safety than string comparisons, but still is extensible. I don't know what that looks like right now, so I'll think about it more.

I agree that string comparisons aren't the most elegant solution, not only for type safety, but also for performance reasons.

2. I will certainly look into the file checkpointing! Seems like you're describing a trait bound behind a feature gate?

FileCheckpoint is behind the serde feature gate, but this is not what's "(de)activating" the trait bound. FileCheckpoint imposes a Serialize trait bound on the solver and the state (here), but only in case FileCheckpoint is actually used. If a user doesn't use checkpointing in their code, the state and solver do not need to be serializable.

3. I may have miscommunicated about this point. Which data to log isn't stored in `State` but rather the `Observer` itself. I only modified that file to add the `enum` that defines what "names" of data are available.

I see, sorry for the misunderstanding!

For the solver specific data, like parameters and gradients, I do think it would take a different type of observer, as right now the Observer doesn't know which parameters are available, just what is exposed through the State trait, correct?

That's a good point! The State trait does expose the parameters via get_param and get_best_param (here). Gradients are not accessible via the State trait, but it should be possible to create a dedicated GradientObserver which only works with IterState states. Then you will have access to all fields of the IterState and it is easy to turn the observer on and off for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants