Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A document type for error messages #13169

Merged
merged 7 commits into from
May 30, 2024

Conversation

Octachron
Copy link
Member

Currently when the compiler needs to build a fragment of an error message while playing nice with Format, the best option is to construct a Format.formatter -> unit closure. This construction is used in many part of the compilers.
However, it has proven to be quite brittle due to the simultaneous uses of many global states when printing error messages, see #13099 for a recent example (which was triggered by a printing closure run in a different environment that the one it was built in).

This PR proposes to remove this brittleness by a introducing a document type which represents a list of formatting instructions

type element =
  | Text of string
  | Open_box of { kind: box_type ; indent:int }
  | Close_box
  | Open_tag of Format.stag
  | Close_tag
  |  ...

to be sent to an actual formatting engine at a later point. Since this document type contains only data and no closures (except for an escape hatch), we are guaranteed that printing this type does not create side-effects without losing any of the formatting abilities of Format. In particular, this document type can be serialized while preserving all the break hints, formatting boxes and tags (which semantics could then be reimplemented by developer tools).

In term of implementation, this PR adds a new Format_doc module which provides:

  • format string interpreters that output this new document type
  • printing functions that mirrors the printing functions in the Format module
  • a conversion function from Format_doc printer to Format printer
  • a "deprecated" conversion function in the other direction which uses a temporary escape hatches
  • A submodule for building document in an immutable way

As a consequence the Format_doc module is source-compatible with the Format module, and for most of the compiler modules replacing open Format by an open Format_doc is enough to switch to the new kinds of printers.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and overall quite reasonable. I asked a few questions. I don't understand the very fine details of the last commit, which I think is the most subtle one. (The rest I believe are correct.)

The fact that the testsuite passes mostly unchanged is a very good test for the PR, so I am confident in its overall correctness.

| Break of { fits : string * int * string as 'a; breaks : 'a }
| Flush of { newline:bool }
| Newline
| If_newline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by If_newline which is an odd construct (very imperative). @Octachron tells me that it is not actually used in the compiler so we could get rid of it if we wanted -- same with With_size which is also weird. I don't know if we should rather follow the oddities of Format to the letter, or remove the stuff we don't like and don't need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC If_newline predates the form of Break here, and I wouldn't be surprised if If_newline is not really needed.

If you ever want to include static unicode text/symbols in diagnostics, then I expect With_size could be useful. If the unicode you want to emit is known ahead of time, the complicated question of computing its width can be worked around by specifying the size. Format itself works well in this way.

utils/format_doc.ml Outdated Show resolved Hide resolved

end

(** {1 Compatibility API} *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the functions that are not so nice and mostly for compatibility in a Compat submodule. But I am okay with the current approach as well, the functions in fact tend to make sense already.

The universal variable "'a" would escape its scope
The method "f" has type "'a -> int", but the expected method type was
"'a0. 'a0 -> int"
The universal variable "'a0" would escape its scope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I liked seeing 'a and ´b better than 'a and 'a0 (personally I am fine with any order of occurrence, but I like the variables to be easy to distinguish). But maybe in some other examples reusing the same radical helps readability?

typing/printtyp.ml Outdated Show resolved Hide resolved
typing/printtyp.ml Outdated Show resolved Hide resolved

(** {1 Compatibility API} *)

(** The functions and types below provides source compatibility with format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening this module adds the document types, fold, etc.. into the environment. What do you think of moving the compatibility API into a Compat submodule ?
Also, highlighting that it's at the same level as the Core API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One my design goal was to make the replacement of Format by Format_doc as invisible as possible.
Typically, the Core submodule is only exposed because I though it would be a shame to have a simple immutable API available and hide it away. I am not sure I would encourage people to switch to this API.

From that perspective, I think it is better for now to keep the most used features (the compatibility functions) at the toplevel of the module. But I could hide the core definitions inside the Core submodule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the type definitions and core functions inside the Core submodule, and renamed it to Doc. Like this, an open Format_doc only opens the compatibility API and the Doc submodule.

| Flush {newline=false} -> Format.pp_print_flush ppf ()
| Newline -> Format.pp_force_newline ppf ()
| If_newline -> Format.pp_print_if_newline ppf ()
| With_size _ -> ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions like pp_print_substring_as, with_size and if_newline are not used. Is offering an alternative API to Format a goal ?
Otherwise, why not keep it smaller to help maintenance ?

You mentioned localization in #12182 (comment). Could it be built on top of the this API ? Making it as small as needed could help adapting it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first goal was full source compatibility with Format, since this means compiler contributors don't need to know two APIs.

(And if we were to move away from Format compatibility, I would rather adopt Fmt naming convention)

For basic localization, a simple way to add support for compiler variant would be to add hooks for the format string interpreters since we only want to localize static texts.

@Octachron Octachron force-pushed the format_doc_for_error_messages branch 3 times, most recently from c23a81b to b9b1b82 Compare May 24, 2024 15:25
Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some code duplication between utils/format_doc.ml and stdlib/format.ml, for example the printf-interpretation functions (output_acc and friends), and some higher-level printers such as pp_print_text, which decompose into calls into more basic pp_print_* functions.

I wonder if it would be reasonable to reduce this duplication by introducing a functor that builds this higher-level structure over the pp_print_* functions. One instance would be in stdlib/format.ml, the other in utils/format_doc.ml. One pain point is that the functor should be in the stdlib itself, which is slightly annoying -- camlinternalFormatter.ml?

parsing/location.mli Show resolved Hide resolved
utils/format_doc.ml Outdated Show resolved Hide resolved
@Octachron
Copy link
Member Author

There certainly some part that could be factorized using a Format-like functor builder, but it doesn't seem that straightforward to find a home for this functor inside the standard library. Moreover, there are also some interesting design choices to discuss in term of functor complexity versus flexibility.

Thus, I would rather report the standard library bikeshedding to another PR.

@Octachron Octachron force-pushed the format_doc_for_error_messages branch from 3fc2f25 to 7a1e00b Compare May 27, 2024 14:15
This ensures that only the compatibility API is available after an
`open Format_doc` used to replace an `open Format`.
and let the compatibility function at the top of the Location modules
to improve backward compatibility for compilerlibs users.

* add a `Doc.quoted_filename` function too.
@Octachron Octachron force-pushed the format_doc_for_error_messages branch from 7a1e00b to 07128d4 Compare May 29, 2024 14:56
@Octachron
Copy link
Member Author

I have updated the PR after some noise clean-up, I plan to merge it tomorrow.

@gasche gasche merged commit 1b09b92 into ocaml:trunk May 30, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants