Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AbstractFilePath #192

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

mmhat
Copy link
Contributor

@mmhat mmhat commented Jun 21, 2024

This PR adds a new namespace OsPath to the library which contains a version of Path that is a newtype wrapper around the System.OsPath filepaths. As a consequence all functions that took a FilePath as an argument or produced on, or that took/produced a String (e.g. extensions) now use the counterparts from System.OsPath and System.OsString respectively.
toFilePath is an exception to this, it still gives you a System.FilePath.FilePath.
For the implementation I aimed to stay as close to the existing Path modules as possible, that's why there are for example also ports of the deprecated functions included.

As it stands this PR consists of two parts: The first commits are cleanup/refactoring commits of the code that already exists.
Those happened either because something was missing (hie.yaml) or something got in my way (.ghci.conf) or - the biggest part - as a preliminary work before the replication of the various modules. For example, I refactored the test testsuite such that the copy could be easier adapted to the new OsPath hierarchy.

There are some ugly parts where I am not entirely sure if I found the right solution - in particular the aeson instances and everything involving conversion from one encoding to another, where I had to use unsafeDupablePerformIO at some places.

Gently pinging @hasufell @mpilgrem (I noticed to late that you started working on this issue too) and @NorfairKing since I am interested in your feedback specifically.

Fixes #189

mmhat added 29 commits June 18, 2024 15:10
Suppresses redundant patterns warnings.
@hasufell
Copy link

Wrt aeson, also see:

Can you point me to the places where you use unsafeDupablePerformIO or do encoding/decoding?

I'm on my phone and PR reviews are hard.

Generally, we shouldn't need to ever assume an encoding. Using decodeFS/encodeFS can be a sign of a bug, but not always.

@NorfairKing
Copy link
Collaborator

@mmhat I'll gladly review this and probably get it merged eventually, but I need some time to review this carefully.

@NorfairKing
Copy link
Collaborator

@mmhat I just tried building and testing this locally.

Could you get stack test --pedantic to pass?

 - Removed 'OsPath.Internal.toFilePath'
 - Assume Unicode encoding for Aeson type class instances
 - Stack build works with --pedantic flag
@mmhat
Copy link
Contributor Author

mmhat commented Jun 24, 2024

Wrt aeson, also see:

I this blog post a while ago, but I really forgot that JSON marshalling is covered there!
So thanks for the pointer, and thanks for writing this in the first place.

Generally, we shouldn't need to ever assume an encoding. Using decodeFS/encodeFS can be a sign of a bug, but not always.

Ah, yes, you are right; I changed the JSON instances now such they assume Unicode encoding for the underlying filepath.

Could you get stack test --pedantic to pass?

Done.

instance ToJSON (Path b t) where
toJSON =
either (error . displayException) toJSON
. OsPath.decodeUtf

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the error, but it's up to debate:

  • try decodeUtf (equals utf8 on unix and utf16le on windows)
  • in case of failure, fall back to latin1 on unix and ucs-2 on windows, which both cannot fail (morally... the types of decodeWith lie in this case)

This is what I dislike about aeson. You can't really specify different encodings for the same type, unlike waargonaut.

Copy link
Contributor Author

@mmhat mmhat Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me.
@hasufell Does your comment also apply to the encoding functions (e.g. encodeWith ucs2le)? Also, I found System.OsString.Encoding.ucs2le, but no TextEncoding for latin1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hackage.haskell.org/package/base-4.20.0.1/docs/System-IO.html#v:latin1

encodeWith would error on Chars outside of the range. There is very few, maybe no Encoding that is total under encodeWith, since you can represent Surrogates as Chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hackage.haskell.org/package/base-4.20.0.1/docs/System-IO.html#v:latin1

Thanks; I missed that System.IO.TextEncoding and GHC.IO.Encoding.Types.TextEncoding are the same. The latter is linked in the Haddocks in System.OsString.Encoding, e.g. here.

Regarding the FromJSON/ToJSON instances: The more I think about it the less I am convince that they should be in this library in the first place - Or at least not brought in scope by OsPath. There is clearly more than one sensible instance and throughout the ecosystem that is dealt with by leaving the instance definition to the user of a library; For example there are no Semigroup/Monoid instances for Int for that reason.
It is of course convenient to have some pre-defined instances at hand that cover most of of the use cases, but ideally those would be provided by another (public sub-) library. Personally I don't make use of those instance more often than not, and pulling in aeson as a dependency is in my projects often unnecessary. So ultimately I'd like to get rid of the (mandatory) aeson dependency at some point.

As for this PR, I'm inclined to move the JSON-related instances for OsPath to an own module, say OsPath.Aeson, and add a flag aeson-instances to path that controls whether any JSON-related instances are build at all (defaults to true).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for this PR, I'm inclined to move the JSON-related instances for OsPath to an own module, say OsPath.Aeson, and add a flag aeson-instances to path that controls whether any JSON-related instances are build at all (defaults to true).

cabal flags must not control exposed API though, because a package can not specify "I need this flag of one of my dependencies enabled". Cabal has no means to specify this, so the end user may specify an invalid configuration and get a compile error with missing modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, you are right indeed. And frankly I should've known better before making such a proposal, given how much time I spent trying to build GHCup before I figured that I have to set a flag for streamly... I'm quiet happy that this changed.

Copy link
Collaborator

@NorfairKing NorfairKing Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had a discussion about aeson before:
#136

If there is no good aeson instance (for example because it needs IO) then I'd rather not provide any, but honestly that'd be a bit weird :P.

src/Path/Include.hs Outdated Show resolved Hide resolved
src/Path/Include.hs Outdated Show resolved Hide resolved
, (1, elements (map ord "./\\"))
]
)
shrinkValid _ = [] -- TODO: Not yet implemented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we have shrinking before?
This will be important if/when we run into bugs but it's only a maintenance burden so maybe not so important for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did; But the one for FilePath is (obviously) the the same as the one for the general-purpose String, which is provided by the genvalidity package. Since there are no pre-defined instances for OsPath, OsString or OsChar and we have to roll our own, I thought that we might reconsider how we shrink paths specifically.
I didn't find the time to do that, but I didn't want to blindly copy existing code either; Hence the TODO...
But you are right, before we merge this I want to provide an implementation one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting the genValid implementation as well, this looks wrong to me as well: IMHO the generator should not only produce Unicode paths, but random sequences of Word8/Word16 separated by /, \ and ..
I.e. we should not assume an encoding.

Copy link

@hasufell hasufell Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a meaningful generator for windows paths is incredibly hard.

filepath has done this by expressing them in ABNF.

See here: https://github.com/haskell/filepath/blob/master/tests/filepath-equivalent-tests/Gen.hs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, that looks horrifying! Thanks for the pointer though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, that looks horrifying! Thanks for the pointer though.

It is beautiful, because it's a pretty good ABNF. I'm planning to actually represent filepaths through that data structure at some point. Otherwise it's not possible to have correct splitting etc. Lots of functions in filepath are ad-hoc.

shrinkValid _ = [] -- TODO: Not yet implemented

instance Validity PLATFORM_PATH where
validate = trivialValidation -- TODO: Not yet implemented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks incomplete. I'd be surprised if these were trivially valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I simply don't know (yet) what constitutes a valid PosixPath/WindowsPath... See also my comments in the previous thread.
@hasufell Do you have any idea what a sensible notion would be?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends what we mean with "valid". Do you mean the isValid function from filepath? It's a bit fuzzy. The only thing you need to keep in mind additionally for the new OsPath API when dealing with the constructors yourself is that windows paths are expected to be [Word16]. So if you create a one-byte WindowsString manually, almost all functions will throw error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I meant was answered here: https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names

This is not a problem for the instance in the test-suite of this library since we do not pass the generated paths to any OS API, but if we move the instance to a more general-purpose service package like genvalidity-filepath we have to implement those restrictions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As some of those SO answers say: it's almost impossible to have an exhaustive list of invalid filepaths on windows. The filepath isValid is just best effort.

@NorfairKing
Copy link
Collaborator

@mmhat Great job on this work. It must have been quite a fiddly thing to get right, but you got stack test --pedantic to pass as-is on my machine the first time. 🎉

I had a look through the changes and don't see anything disagreeable.
I left a few nits but nothing strictly blocking I think.

I wasn't confident so I will just ask you: Can you please double-check that the original Path API isn't changed and no tests have been removed?

@mmhat
Copy link
Contributor Author

mmhat commented Jun 25, 2024

@mmhat Great job on this work. It must have been quite a fiddly thing to get right, but you got stack test --pedantic to pass as-is on my machine the first time. 🎉

Thank you!

I wasn't confident so I will just ask you: Can you please double-check that the original Path API isn't changed and no tests have been removed?

I am fairly certain that none got removed but more were added, but I will double-check that and provide a summary of what has changed 👍

Regarding our earlier discussion about genvalidity + System.OsPath instances: Ideally those would be provided by a genvalidity-filpath package as those are not specific to the path library. Would you accept the contribution of such a package in https://github.com/NorfairKing/validity?

@NorfairKing
Copy link
Collaborator

NorfairKing commented Jun 26, 2024

Regarding our earlier discussion about genvalidity + System.OsPath instances: Ideally those would be provided by a genvalidity-filpath package as those are not specific to the path library. Would you accept the contribution of such a package in https://github.com/NorfairKing/validity?

Absolutely!

EDIT: I'd be happy to guide you in that.

- Also started with documentation.
- Expose 'OsPath.fromSomeBase'
- Require aeson >=1.0.0.0 to avoid tons of CPP
@NorfairKing
Copy link
Collaborator

@mmhat If you're ever blocked on me, please ping me.
AFAICT this is blocked on the TODOs, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support AbstractFilePath
3 participants