Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how we handle verbosity and logging #302

Open
leouieda opened this issue Feb 9, 2022 · 6 comments
Open

Change how we handle verbosity and logging #302

leouieda opened this issue Feb 9, 2022 · 6 comments
Labels
enhancement Idea or request for a new feature question Further information is requested

Comments

@leouieda
Copy link
Member

leouieda commented Feb 9, 2022

The current implementation uses the logging module to print messages about downloads and processing operations. This has a few drawbacks that we didn't expect when it was first implemented:

  1. We have verbosity on by default by setting a handler for the logger, which is against recommendations for libraries
  2. Pooch is a library meant to be used in other libraries (pooch.Pooch) but also in end-user code (pooch.retrieve) in notebooks/scripts. For libraries, silent Pooch by default makes more sense but when using a notebook default verbosity is probably better since we let users know about downloads, processing, and hashes.
  3. If a library changes the Pooch logging handler (to silence or explicitly enable it), all of their dependencies will have the default Pooch behaviour changed without warning.
  4. Disabling verbosity requires knowing about the logging module.

A better implementation would be to have a verbose keyword argument in Pooch and retrieve that can be used to generate a printer object (like the rich.console.Console) object that is passed on to the downloaders and processors. Or even a global pooch printer that can be configured by a function (I'd rather avoid having mutable global state, though, since it's a nightmare to test). The existing functionality can be maintained having pooch.get_logger return a mock of the Logger object that modifies our printer instead.

This way:

  • Libraries will be properly isolated from each other
  • Setting verbosity is not global can requires only a single keyword argument change
  • Backwards compatibility is maintained by setting verbose=True by default and making get_logger return something that can be used as a global verbosity setting

Or we just give up backward compatibility and release Pooch 2.0.0 (with a deprecation warning in a previous release).

Feedback from users on this would be much appreciated, particularly if Pooch is one of your dependencies.

See #301 #232

@leouieda leouieda added enhancement Idea or request for a new feature question Further information is requested labels Feb 9, 2022
@mathause
Copy link
Contributor

mathause commented Feb 9, 2022

xarray (I think) and regionmask would likely welcome this change.

A proper deprecation could be a bit of a pain. You could check if get_logger().level != default and issue a warning if not? (And supress output if get_logger().level > 20?)

@danshapero
Copy link
Contributor

I'd also welcome this change in icepack.

@dopplershift
Copy link
Contributor

IMO, a much simpler start would be to remove this line:

LOGGER.addHandler(logging.StreamHandler())

By default, the logging module is setup with a lastResort handler that, in the absence of any other handler, sends all WARNING or higher messages to stderr.

I don't have skin in this game, I'm content to roll with whatever Pooch does by default. I'll just say that logging frameworks exist so that all the API calls in a system don't need to grow arguments to control output. I'm also not personally convinced that logging is somehow inadequate here.

@leouieda
Copy link
Member Author

Thanks for the feedback all.

I'll just say that logging frameworks exist so that all the API calls in a system don't need to grow arguments to control output. I'm also not personally convinced that logging is somehow inadequate here.

I agree that logging is great and it makes implementing this sort of thing much easier. But I think the use case that logging is designed for is not how we want to use it. From my understanding, logging is meant to be used:

  • In a library, call logging setting the null handler (i.e., silent)
  • Application developers set the handlers they want for the libraries they want
  • End-users don't write code and see the logging messages that the application developers want them to see
  • The application code is not used to build further applications downstream

For this, the recommend of not setting a handler in library code makes perfect sense. And by having singleton loggers, application developers can set handlers for the entire stack.

But Pooch is used in mainly 2 ways:

  1. Library developers use Pooch. Their users are mostly 1) other library developers downstream (like xarray -> metpy) or 2) people writing scripts/notebooks using the sample data (usually tutorials or docs).
  2. Directly by people writing scripts and notebooks (through pooch.retrieve). This is probably where logging messages are most useful because these people know they are using Pooch.

For 1. if any of the libraries set a handler, then they influence everyone else in the stack in strange ways. For example, if scikit-image adds a handler and metpy doesn't, an end user will see logging messages for scikit-image and metpy in their notebook if they import both but not if only importing metpy. So the only ones who should be setting handlers are the end-users writing the scripts/notebooks during a tutorial or following the docs. They probably won't since the whole idea is that they don't even need to know Pooch exists.

For 2. it would be a bad experience to have to import the logging module and add a handler just to get a message printed when calling retrieve. So something like retrieve(url, hash, verbose=True) makes sense here.

Which then brings us to:

By default, the logging module is setup with a lastResort handler that, in the absence of any other handler, sends all WARNING or higher messages to stderr.

Removing that line would mean Pooch is silent by default (we mostly issue INFO level messages). So we might as well set a NullHandler as the logging docs suggest. If any end-user really cares, they can set the handler (most won't). We would need to warn library devs to not set a handler so they don't influence other libraries. But this would mean that there is no way for libraries to control the verbosity of Pooch and most of the messages will likely never be seen.

I'm OK with that but I'm hesitant to make the change and piss people off who want a way to enable/disable this. I don't see a way of delivering that other than the implementation I outlined above. I know this is probably not a big deal and most people won't care either way.

@mathause
Copy link
Contributor

Just to support the point - xarray sets the pooch log-level when tutorial data is downloaded:

    logger = pooch.get_logger()
    logger.setLevel("WARNING")

Setting the log-level for other users was certainly unintentional, could be fixed easily, and does not have huge consequences, but shows that it's easy to get wrong.

@Dr-Irv
Copy link

Dr-Irv commented Nov 6, 2023

I came across this in the following use case:

  1. Our application code calls geodatasets which calls pooch
  2. pooch issues the "Downloading" message to the console
  3. Our code runs on Google Cloud in such a way that all console messages are captured and displaying in a GCP logging site
  4. For the logs to work right on that GCP site, they need to be formatted as JSON, otherwise they get tagged as an "error"

In our application code, we handle this by setting the root logger to have a console handler that formats the message as JSON.

But due to the line suggested to be removed in this comment: #302 (comment)
the message doesn't get formatted.

My suggested solution is as follows:

Replace LOGGER.addHandler(logging.StreamHandler()) in pooch with:

if len(logging.getLogger().handlers) == 0:
    LOGGER.addHandler(logging.StreamHandler()) 

Then if the user hasn't created any handlers for the root logger, pooch will add its handler, which I think covers the use cases for people using notebooks.

But if a user has added handlers, or a library has added handlers to the root logger, then pooch output will just use those handlers.

I think this addresses your use cases, but not 100% sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants