Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in adding type annotations? #330

Open
rsokl opened this issue Dec 3, 2022 · 5 comments
Open

Interest in adding type annotations? #330

rsokl opened this issue Dec 3, 2022 · 5 comments
Labels
enhancement Idea or request for a new feature

Comments

@rsokl
Copy link

rsokl commented Dec 3, 2022

Hello! I am a big fan of pooch! 😄

Would you be interested in my adding type annotations to the library? I think that this can help improve the user experience -- via enhanced 'intellisense' in IDEs when working with pooch, and for instant feedback when a user makes a mistake. I have experience creating/maintaining hydra-zen which makes extensive use of advanced typing features, and I have helped Hypothesis adopt type annotations.

Some things that this would entail:

  • Adding typing_extensions as a dependency (this is already quite ubiquitous, and is practically part of the std-lib). This provides backports of nice typing features, such as Literal and Protocol, so that these features can be leveraged without breaking compatibility with older versions of Python.
  • Adding both mypy and pyright CI jobs that scan of source and example code, in order to ensure that the annotations are consistent and accurate, and so that they remain this way.

Let me know what you think!

@santisoler
Copy link
Member

Hi @rsokl! Sorry for the delayed reply. I'm really glad you opened this issue. We have been talking about type annotations but we haven't decided yet if this is something we want to implement. I know that introducing them offers some benefits when working inside IDEs, is there any other benefit that you can tell us about?

On the other side I found a few drawbacks of type annotations:

  • Lowers the readability of the code. When using type annotations the function definitions are a bit more complex. I think this impacts specially on new users and new contributors: understanding how to use or contribute to that code base could be harder if they aren't familiar with type annotations (and most Python novices aren't).
  • It doesn't enforce type checking. We don't get any type checking on runtime by adding type annotations, only some IDEs and language servers use them to warn the user if they are passing an "offending" value. Therefore the benefits of type annotations would be leveraged only by users of those specific IDEs and LS. Please correct me if I'm wrong.
  • It won't replace the docstrings, so we are duplicating stuff. I don't think that type annotations should replace the type specifications we include on the docstrings of our functions, therefore adding type annotations would mean that we have duplicated specifications of the "accepted" types for each parameter. Is there a tool that can automatically check if type annotations and docstrings differ?
  • It somewhat limits the duck-typing spirit of Python. Wether I like it or not, ducktyping is a thing in Python. Sometimes it bites us, but in a lot of situations we can leverage it. It seems that type annotations could impose a soft boundary on the type of the arguments we pass to our functions/methods. Am I right or do I have a wrong perception of this?
  • Require more work. Although this item is a bit subjective I want to mention it. Since our libraries don't use type annotations, it would require some non-neglible amount of work to implement them. Does this investment pay off in the long run? (this is probably the most important and hardest question I ask myself about type annotations).

I know that projects that I love, like xarray, are using type annotations. In particular, xarray implements them in some functions/methods, but not on every one of them. So that makes me wonder if I'm missing some extraordinary benefit of type annotations besides the ones that come with an IDE. And also if maybe having type annotations in only a portion of our functions/methods is sufficient to get all the benefits that they carry.

I leave here two blogposts with oposite opinions on type annotations:

Would love to hear more about your experience and the benefits of type annotations. Even though I listed a few drawbacks I'm open to be convinced that they do worth the effort.

@rsokl
Copy link
Author

rsokl commented Jan 27, 2023

Hi @santisoler thanks for getting back to me.

I know that introducing them offers some benefits when working inside IDEs, is there any other benefit that you can tell us about?

It should be noted that a static type checker like mypy has a CLI. It can be run on one or more .py files and it will provide detailed error/warning reports on those files, based on its static analysis of the code (e.g. if it sees that you called a string method on a variable that it infers to be a bool, it will report an error). The point being that these type checkers are not only to be used in an IDE; they are frequently used in a project's CI/CD pipeline. More on this later.

For the average user, the main benefit would indeed that their IDE would be able to provide useful warnings and autocompletions for code that involves pooch's API. For me, as a user, this can be quite a boon to my productivity. Having type hints help guide my intuition so that I do not need to stop and refer back to the API docs ("okay when they say path, do they mean a str, Path, or either?"). I can write code much faster and with more confidence when I am "tool-assisted" via type annotations.

For library a developer who wants to build off of pooch the benefit of type annotations can be far more dramatic. It makes code refactoring easier and much safer, and it makes it easier for me to test the parts of my code that touch pooch.

Type annotations make it significantly easier and safer to refactor library code 1. For example, I might change some of my library's internals so that they pass around pathlib.Path objects instead of str, and this may impact some of my code that calls pooch. If pooch's API is accurately annotated, then I will immediately know if my str -> pathlib.Path change will cause issues with my pooch usage. Without annotations, I have to hope that I have a test that will ferret out any issues at runtime.

Also consider that it can be challenging to write tests for code that involves pooch. E.g., I may have a function that uses pooch to download several gigs of data, which I do not want to run every time that I exercise my test suite. How do I get at least some assurance that this code works, even as I upgrade to new versions of pooch? I could use monkey patching and mocks to write a basic smoke test for this code, but it is far simpler for me to just run mypy/pyright in my CI to get assurances that my code is using pooch in a way that is consistent with its type annotations. I can even run this static analysis against the main branch of pooch so that I see if there are any upcoming changes in pooch that will impacts the way my library uses it. This future-proofing is not possible with mocks.

On the other side I found a few drawbacks of type annotations:

  • Lowers the readability of the code: First, I should point out that Sphinx can be configured to exclude annotations from the rendered reference docs. I recommend this, as it helps to ensure that the API reference docs remain clean/simple. Also, using an auto-formatter like black can help keep your annotated code legible by introducing generous line breaks. Otherwise I can only argue from anecdote: I have taught a Python course to high school students for 6 years, and I find that type annotations do not bog them down or confuse them. At first they just ignore them, and it isn't a big deal. Later, once they understand what they are, the annotations help them far more than hurt them.
  • It doesn't enforce type checking: There are some popular 3rd party libraries that will enforce type hints at runtime. beartype is likely the most popular runtime type checker at this point. If pooch chose to use something like beartype, it would effectively add runtime type checking -- with verbose error messages -- to all of pooch's interfaces, with minimal boilerplate code.
  • It won't replace the docstrings, so we are duplicating stuff: This is true and is unfortunate. I use numpy-style docstrings, and so I also eat this duplication on a regular basis. Fortunately, this duplication is mainly relegated to public API, and I don't often change these annotations, so it is typically a one-time cost (It would be nice if there was a tool that could manage annotation syncing between signatures and docs)
  • It somewhat limits the duck-typing spirit of Python: Fortunately, with the introduction of PEP 544 this is no longer the case. This PEP introduces "protocols" for structural subtyping (a.k.a duck typing). E.g. you can define a CanQuack type that is satisfied by anything that exposes a quack() -> str method.
  • Require more work [...] Does this investment pay off in the long run? I certainly think that this investment pays off (big surprise, I know 😄 ). I have already made an argument about the benefits that users and library authors will experience. But I also think that type hints makes pooch itself easier to maintain and more reliable. Devs will get instantaneous feedback about any inconsistencies that manifest when they make a change to the internals. This also means that newcomers will have an easier time making contributions to pooch. Lastly, having browsed through the internals, I think that adding annotations to pooch will be relatively easy. I could probably fully annotate the library and add mypy/pyright scans to the CI in 1-2 days if you are interested to see what a first pass looks like.

So that makes me wonder if I'm missing some extraordinary benefit of type annotations besides the ones that come with an IDE

Being able to get near instant feedback about code errors -- without having to run the code -- is incredibly useful. Depending on a project's scale, it is arguably essential. The initial push for static type checking in Python came from DropBox. They had a massive code base that was becoming increasingly hard to reason about, and they needed a way to get instant feedback when they modified their code. So they made mypy. Similarly, Microsoft acquired a ML company and inherited a huge mess of Python code. The team lead decided that, without static type checking, this code represented a big liability, so he wrote pyright 23. For these teams/projects, having static type checking is truly essential. I think that the addition of type checking in Python has been transformational for the language and that it has enabled companies to more sustainably depend on Python in production.

I am at the point where using unannotated code feels like I am missing one of my senses, especially when I am writing a library. I am so used to relying on static type analysis to get feedback on my code, that it feels... really bad to use code that is missing annotations. This is how I ended up opening this issue! To be frank, I find that my workflow is disrupted whenever I begin interacting with pooch because it is missing annotations.

Footnotes

  1. This holds for libraries that depend on pooch as well as for pooch itself!

  2. The author of pyright, Eric Traut, was the project lead for the Windows 7 core team. He is very very experienced. The fact that he felt it was worthwhile to write a type checker from scratch to support his team's ability to inherit a large Python project speaks volumes to how much he, in his great amounts of experience, values static type checking.

  3. Google and Facebook have also invested in writing their own static type checkers for Python.

khaeru added a commit to iiasa/message-ix-models that referenced this issue May 26, 2023
khaeru added a commit to iiasa/message-ix-models that referenced this issue Jun 15, 2023
@leouieda
Copy link
Member

Hi @rsokl thank you so much to the very detailed explanation! I honestly haven't bothered to look into type hints so far but you kind of convinced me to have a go.

One final question about this: Could you implement this incrementally or would it need to be all or nothing for it to be useful?

If you can implement this one class/function/module at a time, would you mind doing it for just the Pooch class so we can get a better sense for what it would look like?

@santisoler
Copy link
Member

santisoler commented Jun 21, 2023

Sorry for not replying... I think this issue got buried in my notifications 🤦🏼

I agree with @leouieda that we could give it a try with incremental steps, if that's possible...

I've started experiencing with type hints, and I found it a little bit cumbersome with complex types like Numpy and Xarray objects... but for simpler Python types (likes the ones we use in Pooch) I think working with them is much more straight forward.

measrainsey pushed a commit to iiasa/message-ix-models that referenced this issue Jul 19, 2023
@eliotwrobson
Copy link

eliotwrobson commented Mar 11, 2024

I just started using pooch and I think this is a great tool! +1 on adding type annotations, I think it would be a substantial help giving the additional information to my editor while coding, and I think it makes code arguably more readable if you're using a formatter like black or ruff. I've also added type hints to other packages after first release, and I think it has been a substantial improvement for the usability.

To jump in on the other questions:

One final question about this: Could you implement this incrementally or would it need to be all or nothing for it to be useful?

This can definitely be implemented incrementally. By default mypy doesn't check typecheck functions that don't have annotations, so you can make changes to one thing at a time. Also, mypy won't use the annotations when typechecking other packages unless there is a py.typed file, meaning that you can wait to add this until things are locally stable.

If you can implement this one class/function/module at a time, would you mind doing it for just the Pooch class so we can get a better sense for what it would look like?

I realize this was not directed at me, but I'm going to take a crack at it and put up a PR with some minimal changes and try to add the typechecking to the CI config.

EDIT: Draft PR here #404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants