Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Extract code blocks by ID #303

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rizo
Copy link
Collaborator

@rizo rizo commented Feb 12, 2019

This is a work in progress attempt to implement testable code examples for odoc. Or to be more precise, the extraction of annotated code blocks from mli and mld files. See #130 for a proposed design and discussion. I might work on tools complementary to odoc to facilitate the execution of the examples and the integration with dune.

@pmetzger
Copy link
Member

There's some facility like this that got built for the Real World OCaml book, isn't there?

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

So is the idea that odoc would extract the code samples and then some other mdx-like tool would call this feature of odoc as part of its processing? That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files. Also not all comments have an associated id, so its not clear how you'd want to deal with those.

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

@dbuenzli
Copy link
Contributor

That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files.

.mli files are subject to pre-processing. That's the reason why we have cmti files no ?

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

I don't know what @rizo had in mind but following the link he provided leads to this

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

.mli files are subject to pre-processing. That's the reason why we have cmti files no ?

Sure, and I would expect that a tool to run the examples in them would need to perform such pre-processing. Certainly similar tools like toplevel_expect_test do. Since we already need to know the various compilation options in order to run the examples that does not seem unreasonable.

Also I would expect such a tool to work like other "expect test" style tools, so to produce an updated version of the ml file, so that we can use dune's promote workflow -- that clearly requires taking the mli file as input.

I don't know what @rizo had in mind but following the link he provided leads to this

There are a variety of suggestions on the thread @rizo linked so I wasn't sure which suggestion was being referred to. It does look like he has the comment you point to in mind. That comment though does not constitute a full proposal. It is not clear to me how you would use those commands to build a useful workflow.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

I should say though, that I think any proposal is going to require the syntax to allow code blocks to be labelled in some way -- so that part seems useful regardless of how we build the surrounding tooling.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

That comment though does not constitute a full proposal. It is not clear to me how you would use those commands to build a useful workflow.

I'm not sure what you are refering to. Basically with the proposal you can see .cmti or .mld file as a tar archive with text files that can be extracted. The rest is yours.

To give an example say you have this example.mld file:


{0 Examples}

The hello world program goes like this:

{hello.ml[
let () = print_endline "Hello"
]}

You may not need this data:

{data.json[
  [1; 2]
]}

Then odoc extract --list-id example.mld outputs:

hello.ml
data.json

Your build system can then pickup these file names to invoke:

odoc extract --id hello.ml -o hello.ml example.mld
odoc extract --id data.json -o data.json example.mld

At which point you may instruct your build system to do something with the generated hello.ml file.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

Sure, but how do I map the results of using hello.ml back to the example.mld file afterwards? We lose the mapping between the comments and the code so you can't use this to implement an "expect test" style workflow -- which is I think the most desirable workflow for this kind of thing.

@dbuenzli
Copy link
Contributor

Sure, but how do I map the results of using hello.ml back to the example.mld file afterwards?

One can add an option to extract to emit linenum directives in the output.

We lose the mapping between the comments and the code so you can't use this to implement an "expect test" style workflow -- which is I think the most desirable workflow for this kind of thing.

I don't know exactly what this is but wouldn't the above suffice ?

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

Perhaps I should clarify what I mean by an "expect test" workflow. I think that one of the most useful aspects of being able to run examples from documentation is to be able to write things like:

{[
    # let foo x y = y;;
    val foo : 'a -> 'b -> 'b = <fun>
    # foo 1 3;;
    val - : int = 3
]}

and have a tool which checks that the results of running that code in the toplevel match what I've written there. In an "expect test" style workflow this is achdieved by having the tool print a new version of the .mli/.mld file with the actual generated output. Dune has special support for such tools -- it will show the diff between the actual and expected output as an error and you can run dune promote to copy the actual output over the original file.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

As examples this is how the mdx and toplevel_expect_tests tools work.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

Well I don't expect odoc to make the whole promotion business. Aren't #linenum directives surrounding each extracted snippet sufficient for the promotion tool to do its business ?

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

Well I don't expect odoc to make the whole promotion business.

Me neither. I don't think odoc needs to provide support for any of this. As long as its comment parser is available as a library all of these things can easily be implemented as separate tools -- including the feature being suggested here.

@dbuenzli
Copy link
Contributor

As long as its comment parser is available as a library all of these things can easily be implemented as separate tools -- including the feature being suggested here.

Maybe but OTOH odoc is the tool that processes .mld files, I'm not sure there's any gain to introduce yet another tool, especially if the scheme mentioned above is reasonably usable for most workflows.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

I can see how this proposal would be better than what I'm suggesting for literate programming, is that what you have in mind? In which case, would only having the support for doing it for .mld files be sufficient? That version makes more sense to me.

My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language of the code fragment along with other data about how the code should be extracted. Specifying the language seems particularly useful for odoc to allow us to correctly do code highlighting in the resulting documentation. I'm worried that we won't easily be able to do that if we merge this patch as is because we'll already have specified a different semantics for the text that appears between { and [.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

I can see how this proposal would be better than what I'm suggesting for literate programming, is that what you have in mind?

That's one of the worfklow along with .mld based tutorials. I'm also interested in extracting my code samples from .cmti files and make sure they compile though. If I take these examples I'm not really interested in topexpecting them. Just extracting them with linenum directives and make sure they actually compile in my dev builds (and potientially install them without linenum directives as sample code).

My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language of the code fragment along with other data about how the code should be extracted.

Agreed I wouldn't be against more structure here. FWIW in CommonMark, the id is an "info string". and is it only suggested that the first word should specify the language. See here.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

If I take these examples I'm not really interested in topexpecting them

The expect test style approach wouldn't give you anything extra, but it would give you what you want in these cases. So if we assume that such support is on the way then I think you could support the extract command only for .mld files and you would have all the things you are after.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

Not sure I understand the deep reason for objecting doing this on cmti files aswell :-) Presumably it's the same code path.

@lpw25
Copy link
Contributor

lpw25 commented Feb 13, 2019

I just want to discourage people from trying to use it to implement the execution of code in comments in mli files since I don't think it is the best way to do it. I'd prefer it if odoc only had commands for things that have a clear use case and where the command is the best approach for addressing that use case.

Whereas the mld version has a clear use case in literate programming and, as you said, odoc is the main tool that understands mld files so it is the logical place to put the feature.

@rizo
Copy link
Collaborator Author

rizo commented Feb 13, 2019

I have pushed an early attempt to document this feature (see draft version here). It mostly focuses on user experience and integration with dune. I understand this is might be significantly out of odoc's scope but, as I mentioned previously, it is not my intention to exclusively focus on low-level odoc-specific interfaces. It is not clear (to me) how all the discussed details (extraction, execution, line numbers, test promotion, etc) connect together into something users can use. And that is my focus.

Let me know if it is preferred to move this discussion to dune.

CC @rgrinberg: you might have an opinion about this.


So is the idea that odoc would extract the code samples and then some other mdx-like tool would call this feature of odoc as part of its processing? That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files.

I think both options could be supported. odoc's parser library is already exposed and it can be used to do the extraction.

Also not all comments have an associated id, so its not clear how you'd want to deal with those.

I covered this in the proposal. Essentially all code blocks without an associated id can be extracted as a common "anonymous" group.

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

I do not at this stage have a final plan for the fully integrated solution. One of the reasons I submitted this PR as WIP was to start this discussion. And, as you mentioned, annotating code blocks is needed in any case.

I started experimenting with mdx and it seems like it could be used to build a tool (I'm calling it odoc-mdx) that would use odoc's parser as a library to extract the code blocks from mli/mld files, execute them and generate .corrected files. I'm hoping dune can be extended to natively support this workflow.


My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language (...)

This is something I mention in my proposal. Essentially using filenames gives us both: the ability to group code blocks and the language information. It does require odoc to interpret code block annotations as file names, but I think it is a reasonable limitation.

I'm worried that we won't easily be able to do that if we merge this patch as is because we'll already have specified a different semantics for the text that appears between { and [.

By that do you mean "other data about how the code should be extracted"? For OCaml, at least the execution options could be passed as # directives (similar to how toplevel_expect_test works).

Agreed I wouldn't be against more structure here. FWIW in CommonMark, the id is an "info string". and is it only suggested that the first word should specify the language.

Mdx actually relies on this. I think we could adopt this model. How would multiple code blocks be grouped for extraction into the same file though?

Maybe something like the following could work? The extraction command would match on the file value.

{ocaml file=hello.ml[ ... ]}

Not sure I understand the deep reason for objecting doing this on cmti files aswell :-)

I just want to discourage people from trying to use it to implement the execution of code in comments in mli files

I'm not sure I understand what this means. Does this only apply to the CLI? Wouldn't the extraction from cmti (or mli) files be required to implement the expect-based workflow?

Copy link
Contributor

@dbuenzli dbuenzli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @rizo for the design doc. These things help a lot to clarify things and of course provide the basis for the documentation of the feature.

I have left a few comments but basically it quite matches what I think I would like to have. The only thing I'm a little bit sceptical about is the extraction of anonymous blocks, it will likely often be made of garbage so I wouldn't bother.

Other than that the syntactic bit needs to be clarified, I made a proposal that I think blends well with the current ocamldoc language conventions (think of {:uri}).


The file names used to annotated code blocks are also used by odoc to decide
what language should be used for syntax highlighting in the generated HTML. The
language is decided based on the file name’s extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already said that may not be entirely sufficient (I guess clashes do exist). See below for more on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clashes aside, I think a more structured annotation is needed indeed.


**Warning:** code blocks without a file name will not have syntax highlighting.
Once this feature is implemented, the currently used automatic language
inference should be disabled.
Copy link
Contributor

@dbuenzli dbuenzli Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. Historically code blocks meant OCaml. If you didn't want syntax highlighting you would use a verbatim block ({v v}). For small anonymous snippets it's tedious to always have to specify the language. It's better to simply default to OCaml, that's what the ocamldoc language was designed for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally dislike implicit promotions, but let's respect the history in this case. I updated the proposal to match your suggestion.

Input cmti, cmt, cmi, mli or mld file.


Odoc 11VERSION11 odoc-extract-code(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing is a way to extract the list of names and their language. Following the current "odoc way" this could well be odoc extract-code-targets which would list each on their line the set of names and their tokens (see below) existing in the object file.

filename token* 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in my plans but I forgot to include it. Thanks.


-o PATH, --output=PATH
Output file path. If omitted, the provided NAME will be used.
Required for extraction of anonymous code blocks.
Copy link
Contributor

@dbuenzli dbuenzli Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convenience, one could add an --all option which would extracts everything in one go and -o would then denote the destination directory. That way you can distribute a tutorial as an .mld file and simply instruct someone to odoc extract-code --all -o . tutorial.mld to be setup.

For example this gist and associated blog post could well be distributed as a single .mld file (see for example the "# Generate the html page" in the toplevel comment...).

| **Anonymous code block** | **Named code block** |
| ------------------------ | ----------------------------------- |
| `"{[" <content> "]}"` | `"{" <filename> "[" <content> "]}"` |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing this proposal doesn't mention but must be clarified is what happens to named code block which are within stop comments (**/**) ... (**/**).

I think these should be included in the file aswell (they allow to hide setup minutiae you may not want to have in the rendered document).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But… do we get these in .mld files ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected the proposal, let me know if this is what you had in mind.

But… do we get these in .mld files ?

Maybe the extended code block annotation could have an option for that...


- Should odoc require code block annotations to be filenames with extension?
The extension could be used to identify the language and correctly do code
highlighting. On the other hand the code blocks could be annotated only with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one way would be to simply have these two forms

{[:filename token* [ ... ]}
{[ token* [ ... ]}

without more constraints than that.

With filename a syntax that allows a relative file path (that way an .mld file can specify a source file hierarchy) and token arbitrary identifiers that allow the : character.

That way authors and processors are free to collaborate on the meaning they want to give to tokens. Of course odoc should standardize a few behaviour like:

  • On {[:f.$EXT [...]}, odoc HTML gen will try to highlight the block using the most likely language associated to the extension $EXT (if present).
  • On {[:f.$EXT lang:$L [...]}, odoc HTML gen will try to highlight using language $L.
  • On {[ lang:$L [...]}, odoc HTML gen will try to highlight using language $L.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable proposal. I also think this should be the focus for the actual work in the current PR. I'll elaborate on this a bit later.


--anonymous
Extract code blocks without name. Cannot be used with the `--name'
option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not. Note however that my guess is that unless the author pays particular attention to her anonymous code block the result will likely be garbage most of the time (you may have code showing errors, or even get different languages...)

Copy link
Contributor

@dbuenzli dbuenzli Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing @jsomers desire for certain fileless senarios (though formally you could always simply specify a filename to use by convention). This could be --anonymous=TOKEN, and would extract anonymous code blocks that have the token TOKEN (e.g. ocamlx).

It would also prevent the likely garbage problem I mentioned, if completely unconstrained.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it anonymous if it has a name (ocamlx in your example)?

But I do understand your concerns. I will try to address this in the new proposal for code block annotations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still regarding anonymous blocks:

The only thing I'm a little bit sceptical about is the extraction of
anonymous blocks, it will likely often be made of garbage so I wouldn't
bother.

It was one of the @lpw25's concerns, I believe. I imagine for large code bases
it would be useful to operate on existing unmarked code blocks in some way. So
if people do want to execute them they might get an opportunity to fix the
"garbage".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it anonymous if it has a name (ocamlx in your example)?

In the sense it has no filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically in the two forms here. The first one is a named code block and the second one is an anonymous one.

I quite like the idea of being able to extract by token, this leaves a lot of flexibity to authors to attribute meaning to what they want to extract.

Copy link
Collaborator Author

@rizo rizo Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea of token annotations. Without wanting to commit to a concrete syntax, something like {ocaml id=BLOCK_ID[...]} could work, where BLOCK_ID could be anything including a file name. This of course would be used during extraction. I'd make the --output option required though, to clarify that the id is not necessarily a file name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's preferable to have to direct support for filenames.

@jsomers
Copy link

jsomers commented Feb 13, 2019

Thanks @rizo for putting together this design. It jibes with what I've been thinking about, though I think there is one key difference.

Should odoc require code block annotations to be filenames with extension?

I don't think so. For an mli file, the workflow I imagined was:

  1. In a doc comment that includes a code example you'd like to execute, you add a small annotation, either {ocaml| or {x| (for "execute" -- we can assume the block is OCaml; perhaps you could have another token to specify the language).

  2. In your dune file in the mli's directory, you specify that you want to execute marked code blocks.

  3. If your code blocks contain toplevel syntax, like

# 1 + 1
- : int = 2

then you go through the diff-and-promote workflow. (That is, if the output is incorrect, you generate a .corrected file and allow the user to easily diff against and accept this correction.) There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately. (Toplevel_expect_test and mdx both do this.)

Otherwise, if you're not dealing with toplevel-style code, you simply throw a compiler error if the code block doesn't compile.


Note that this workflow is almost exactly the same as processing a markdown file with mdx, except that instead of executing the code in triple-backtick blocks, you're executing the code in {ocaml| or {x| blocks.

For mld files, the flow would be basically identical to mdx, except that instead of the base markup language being Markdown, it would be the markup language used by odoc.

By doing things this way, without filenames, you do give up the ability to write code in one block that references input from files extracted from other code blocks -- but I don't think that's nearly as important as being able to simply run small self-contained examples that use the library you're documenting.

As the user of the tool, I don't want to have to think about the code in my doc comments being extracted out into a separate file where it's then executed; I just want the code executed. If the tool needs to extract my code to separate files under the hood, that's fine, but I don't think that detail should be exposed to the user. The user (at least this user) just wants to say "execute this block" on some blocks.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

@jsomers I don't think what is described here prevents from implementing a system that does exactly what you want. What is described here is rather the plumbing and a more general mecanism that allows you to implement the system you would like as a special case.

There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately.

Extracting with line directives is precisely what will allow the underlying tool to figure out where in the mli/mld file the code you are processing lived and, for example if you compile the extraction, to actually report any compilation error in the original file and location rather than in the extracted file.

@jsomers
Copy link

jsomers commented Feb 13, 2019

@dbuenzli

There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately.

Extracting with line directives is precisely what will allow the underlying tool to figure out where in the mli/mld file the code you are processing lived and, for example if you compile the extraction, to actually report any compilation error in the original file and location rather than in the extracted file.

Ah, I see -- my worry is that we would require the end user to add line-number directives by adding a special notation. But it sounds like you're saying we wouldn't.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 13, 2019

Ah, I see -- my worry is that we would require the end user to add line-number directives by adding a special notation. But it sounds like you're saying we wouldn't.

No, that would be absolutely terrible :-) Basically the lineno stuff would work as as follows. Suppose you have your file.mld:

Bla bla
{ocamlx[
let f x = x
]}
Ho Ho
{ocamlx[ 
let () = print_endliiine "bla.ml"
]}

Invoking odoc extract-code --with-line-numbers --anonymous=ocamlx -o ocamlx.ml file.mld would produce the ocamlx.ml file:

#3 "file.mld" 
let f x = x
#7 "file.mld"
let () = print_endliiine "bla.ml"

Now you can try for yourself to compile that with ocamlc that's what gets reported:

> ocamlc ocamlx.ml
File "file.mld", line 7, characters 9-24:
Error: Unbound value print_endliiine
Hint: Did you mean print_endline?


Library authors are encouraged to include examples and short snippets of code
in documentation to demonstrate how to effectively use their library. Such code
snippets are included in docstrings as code blocks and therefor cannot be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/therefor/therefore

snippets are included in docstrings as code blocks and therefor cannot be
executed and tested in the same way regular source files are. This leads to
code duplication for library authors who want to make sure their examples can
be correctly executed, and to out of date examples when they forget to update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"out-of-date" for the adjective.


To address this problem odoc implements the ability to extract code blocks from
documented interfaces and documentation pages (`mli` and `mld` files
respectively) into source code files. With this build systems can implement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/With this/With this,


## Named code blocks

In the new version of odoc code blocks can be annotated with a file name. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/odoc/odoc,

Both named and anonymous code blocks can be extracted by odoc via the
command-line interface. Code blocks with the same file name in a given
documentation file will be concatenated and written into a file with that name.
Optionally a different output file name for a given group can be provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Optionally/Optionally,

number directives](https://caml.inria.fr/pub/docs/manual-ocaml/lex.html#sec86)
in the OCaml manual).

**Note**: unrelated code blocks do not need to have a unique file name, it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean they will all go into one file example.ml? If so, this should read something like "unrelated code blocks can be put into the same file, and so don't need unique file names." However, why would it be good to put unrelated examples into one file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, why would it be good to put unrelated examples into one file?

If an interface has hundreds of functions each one of them with examples, will users want to provide a unique name to each code block to get code execution and syntax highlighting?

It is clear that filenames are not a satisfactory way to annotate code blocks, so this needs to be changed in the proposal anyway.

@lpw25
Copy link
Contributor

lpw25 commented Feb 14, 2019

Thanks for the design doc @rizo that makes your plan much clearer. I think the aim is a good one but in general I think that the proposed extract-code command is not the right approach to implement what you are aiming for. From your comments I think you already know what the disadvantages and alternatives are, but I'll make them explicit here anyway:

The need to talk about external files is an unnecessary burden for users. In the common case they simply want to execute all the comments in a single environment. This behaviour should be the default. If they wish to keep some comments separate there should be an optional argument like the env argument in mdx to specify named environments with no requirement that they be unique filenames, something like:

{ocaml env=foo[
...
]}

If they wish not to run a particular comment there should be an optional argument for that to:

{ocaml executed=false[
...
]}

The dune support should simply be something like (execute-examples) in the documentation stanza. Having to manage the extraction of multiple files is an unnecessary burden on the user.

The extract-code command is not sufficient to implement the "promote" aspect robustly. It provides no mechanism to pass arguments about how the code should be run to the actually testing tool -- for example mdx's non-deterministic and version options on code snippets would be useful. Line directives are not a robust means of communicating when blocks start and end -- if I put a line directive in my comment fragments the tool will simply break. To do this kind of thing robustly you would need some quoting mechanism and to use a format other than just an .ml file.

The extract-code command is also not necessary to implement the "promote" aspect. The comment parser is already available and the tool can simply use that on the parsetree of the mli file directly.

I started experimenting with mdx and it seems like it could be used to build a tool (I'm calling it odoc-mdx) that would use odoc's parser as a library to extract the code blocks from mli/mld files, execute them and generate .corrected files. I'm hoping dune can be extended to natively support this workflow.

This is I think a better approach than extract-code and I think it would be better to focus on this.

Maybe something like the following could work? The extraction command would match on the file value.

{ocaml file=hello.ml[ ... ]}

I think that would be better for implementing extract-code. As I said above, extract-code on mld files is as useful command for literate programming and doing it this way will prevent it from getting in the way of the mdx-based alternative.

@dbuenzli
Copy link
Contributor

I think that would be better for implementing extract-code. As I said above, extract-code on mld files is as useful command for literate programming and doing it this way will prevent it from getting in the way of the mdx-based alternative.

I'm not exactly sure how something generic like this gets in the way of mdx-based alternatives. Let's seperate concerns here:

  1. Basically the {[]} has to be extended in a useful way. So let's agree on the syntax. What's your proposal ? Simply a list of uninterpreted tokens ? I'm fine with this aswell though we should be careful that such a token allows to express a file path.

  2. Regarding extract-code if you think it's not useful for you and you can do better using an external tool then by all means go build that tool. Personally I have quite a few uses for it and I would like to see it implemented in some way as proposed here.

@lpw25
Copy link
Contributor

lpw25 commented Feb 14, 2019

These samples are full executables.

Thanks that makes things clearer. So you have some literate programming within the mli files of a library. That's a reasonable use case. My preference would be to have extract-code take the .mli file directly as input, since this should really be an operation on source files rather than installed files. It shouldn't be much harder to implement than the cmti version -- note that dune already takes care of preprocessing the mli file to a (marshalled) .mli file so you don't need to deal with -pp -- but if that is not the case then the .cmti file based version is a reasonable alternative.

@dbuenzli
Copy link
Contributor

I agree overall that the default behavior should be execute all ocaml code blocks without any extra configuration.

I don't think this concerns this actual proposal but I would rather not do this and rely on a special token for that (e.g. ocamlx).

You still need a bit of ordering care when you start executing code blocks. When I don't want to care about this I prefer not having to care... by which I mean that having to specify an exec=false would mean I would have to care. For example I want to be able to simply add an ocaml code block anywhere in the document without having to think about it or have to define all its free variables so that it can get executed.

For all it's awesomeness and shininess I still think that executing code blocks in API docs remains quite a limited tool as soon as you escape the realm of documenting basic data structures.

@rizo
Copy link
Collaborator Author

rizo commented Feb 14, 2019

The need to talk about external files is an unnecessary burden for users. In
the common case they simply want to execute all the comments in a single
environment. This behaviour should be the default. If they wish to keep some
comments separate there should be an optional argument like the env (...)

I added this to the requirements in the proposal. To be clear the default behavior in my current proposal is to execute all code blocks that have a file name with the ml extension. Users don't have to think about extraction at all (the extract_code option in dune is optional, and so is execute_code).

I do agree that annotating code blocks with the language and allowing arguments like env is a better approach. I don't see why different environments couldn't be extracted into files though, if users really want that. The two uses cases being: (1) using other code blocks as data files for execution and (2) installing examples for package distribution. Use case 1 would be done by the mdx-style tool (i.e. it would extract the data files before execution), and use case 2 would be done by dune (or other build systems). I think this applies to both mld and mli files.

Non of this has to be implemented for the initial version, of course. We can start with a simple mdx-style tool that executes ocaml code blocks (potentially with the x annotation) and ignores all others. At this stage extraction for mld files can be implemented to support self-contained literate programming documents described by @dbuenzli. Finally, the extraction could be implemented for mli files, along with dune support.

I'll try to clarify this in my updated proposal.

@rizo
Copy link
Collaborator Author

rizo commented Feb 14, 2019

@lpw25:

The extract-code command is also not necessary to implement the "promote" aspect.

I agree. As long as the code blocks are properly annotated with language and
execution options, the complementary tooling can implement the promotion
workflow without the extract-code command.

@rizo
Copy link
Collaborator Author

rizo commented Feb 14, 2019

I don't think this concerns this actual proposal but I would rather not do this and rely on a special token for that (e.g. ocamlx).

I personally don't like the ocamlx as an annotation. If we want to annotate code blocks with language identifiers they should be known language identifiers. We would need to do the same for reason syntax, by the way, and having reasonx or rex doesn't look good to me.

But yeah, executing everything unconditionally is also quite extreme. I'll have this in mind and will make a proposal for code block annotations that allow this kind of flexibility.

@lpw25
Copy link
Contributor

lpw25 commented Feb 14, 2019

You still need a bit of ordering care when you start executing code blocks. When I don't want to care about this I prefer not having to care... by which I mean that having to specify an exec=false would mean I would have to care.

I suspect that the best thing to do here is to let the default be specified. I would expect both extract-code and the mdx-like tooling to take an option that said whether to assume exec=false or exec=true as the default.

@rizo
Copy link
Collaborator Author

rizo commented Feb 14, 2019

I suspect that the best thing to do here is to let the default be specified.

Then it's not really a default, in that case ;-).

But I agree that this could be configured on a level higher than a code block. The question is more about what would be the default in case users don't provide this option.

@dbuenzli
Copy link
Contributor

I suspect that the best thing to do here is to let the default be specified.

I suspect this is really not a good idea. It's always better to keep documents self-describing to the readers. If you do this then I must hunt in the build system what default you choose and then remember it in my head when I read a code block.

I would really go for not executable by default and maybe simply {$LANG exec[ ... ]} for those blocks that you want to execute. That works with any language, is clear to the reader and not hard to type.

@lpw25
Copy link
Contributor

lpw25 commented Feb 14, 2019

There are going to be plenty of people who want to test all their code in comments with "odocx" and they're going to be pretty annoyed with having to write ocaml exec=true hundreds of times.

@dbuenzli
Copy link
Contributor

ocaml exec=true hundreds of times.

That's the reason why I proposed ocaml exec ;-)

@dbuenzli
Copy link
Contributor

(even {exec[]} since code blocks default to OCaml)

@XVilka
Copy link

XVilka commented Jun 18, 2019

There are conflicts now, so should be rebased.

@@ -88,6 +88,14 @@
ignore foo
]}

Code blocks can have an identifier attached to them:

{ocaml[
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading example? Using ocaml as the identifier makes it looks like an annotation specifying the language, rather than an identifier to enable references to this specific block from elsewhere.

Suggested change
{ocaml[
{print-two-plus-two[

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current definition of annotations needs to be changed in my opinion as it has some limitations. Unfortunately I haven't been available to work on this in the last few months.

To clarify the misunderstanding here: the code block annotation is not used to reference a specific block from elsewhere (it is not a unique identifier). It annotates a class of blocks to be extracted. It could be a language or a filename, or any other user-defined "class".

Not sure if this is helpful, but let me know if you have any other questions or suggestions.

@dbuenzli
Copy link
Contributor

Could we move on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants