[WIP] Extract code blocks by ID #303

rizo · 2019-02-12T18:17:42Z

This is a work in progress attempt to implement testable code examples for odoc. Or to be more precise, the extraction of annotated code blocks from mli and mld files. See #130 for a proposed design and discussion. I might work on tools complementary to odoc to facilitate the execution of the examples and the integration with dune.

pmetzger · 2019-02-12T22:28:38Z

There's some facility like this that got built for the Real World OCaml book, isn't there?

lpw25 · 2019-02-13T07:26:09Z

So is the idea that odoc would extract the code samples and then some other mdx-like tool would call this feature of odoc as part of its processing? That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files. Also not all comments have an associated id, so its not clear how you'd want to deal with those.

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

dbuenzli · 2019-02-13T08:22:58Z

That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files.

.mli files are subject to pre-processing. That's the reason why we have cmti files no ?

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

I don't know what @rizo had in mind but following the link he provided leads to this

lpw25 · 2019-02-13T09:12:28Z

.mli files are subject to pre-processing. That's the reason why we have cmti files no ?

Sure, and I would expect that a tool to run the examples in them would need to perform such pre-processing. Certainly similar tools like toplevel_expect_test do. Since we already need to know the various compilation options in order to run the examples that does not seem unreasonable.

Also I would expect such a tool to work like other "expect test" style tools, so to produce an updated version of the ml file, so that we can use dune's promote workflow -- that clearly requires taking the mli file as input.

I don't know what @rizo had in mind but following the link he provided leads to this

There are a variety of suggestions on the thread @rizo linked so I wasn't sure which suggestion was being referred to. It does look like he has the comment you point to in mind. That comment though does not constitute a full proposal. It is not clear to me how you would use those commands to build a useful workflow.

lpw25 · 2019-02-13T09:16:11Z

I should say though, that I think any proposal is going to require the syntax to allow code blocks to be labelled in some way -- so that part seems useful regardless of how we build the surrounding tooling.

dbuenzli · 2019-02-13T09:47:18Z

That comment though does not constitute a full proposal. It is not clear to me how you would use those commands to build a useful workflow.

I'm not sure what you are refering to. Basically with the proposal you can see .cmti or .mld file as a tar archive with text files that can be extracted. The rest is yours.

To give an example say you have this example.mld file:


{0 Examples}

The hello world program goes like this:

{hello.ml[
let () = print_endline "Hello"
]}

You may not need this data:

{data.json[
  [1; 2]
]}

Then odoc extract --list-id example.mld outputs:

hello.ml
data.json

Your build system can then pickup these file names to invoke:

odoc extract --id hello.ml -o hello.ml example.mld
odoc extract --id data.json -o data.json example.mld

At which point you may instruct your build system to do something with the generated hello.ml file.

lpw25 · 2019-02-13T11:31:12Z

Sure, but how do I map the results of using hello.ml back to the example.mld file afterwards? We lose the mapping between the comments and the code so you can't use this to implement an "expect test" style workflow -- which is I think the most desirable workflow for this kind of thing.

dbuenzli · 2019-02-13T11:36:33Z

Sure, but how do I map the results of using hello.ml back to the example.mld file afterwards?

One can add an option to extract to emit linenum directives in the output.

We lose the mapping between the comments and the code so you can't use this to implement an "expect test" style workflow -- which is I think the most desirable workflow for this kind of thing.

I don't know exactly what this is but wouldn't the above suffice ?

lpw25 · 2019-02-13T11:39:31Z

Perhaps I should clarify what I mean by an "expect test" workflow. I think that one of the most useful aspects of being able to run examples from documentation is to be able to write things like:

{[
    # let foo x y = y;;
    val foo : 'a -> 'b -> 'b = <fun>
    # foo 1 3;;
    val - : int = 3
]}

and have a tool which checks that the results of running that code in the toplevel match what I've written there. In an "expect test" style workflow this is achdieved by having the tool print a new version of the .mli/.mld file with the actual generated output. Dune has special support for such tools -- it will show the diff between the actual and expected output as an error and you can run dune promote to copy the actual output over the original file.

lpw25 · 2019-02-13T11:42:16Z

As examples this is how the mdx and toplevel_expect_tests tools work.

dbuenzli · 2019-02-13T11:45:05Z

Well I don't expect odoc to make the whole promotion business. Aren't #linenum directives surrounding each extracted snippet sufficient for the promotion tool to do its business ?

lpw25 · 2019-02-13T11:50:06Z

Well I don't expect odoc to make the whole promotion business.

Me neither. I don't think odoc needs to provide support for any of this. As long as its comment parser is available as a library all of these things can easily be implemented as separate tools -- including the feature being suggested here.

dbuenzli · 2019-02-13T11:57:14Z

As long as its comment parser is available as a library all of these things can easily be implemented as separate tools -- including the feature being suggested here.

Maybe but OTOH odoc is the tool that processes .mld files, I'm not sure there's any gain to introduce yet another tool, especially if the scheme mentioned above is reasonably usable for most workflows.

lpw25 · 2019-02-13T12:12:35Z

I can see how this proposal would be better than what I'm suggesting for literate programming, is that what you have in mind? In which case, would only having the support for doing it for .mld files be sufficient? That version makes more sense to me.

My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language of the code fragment along with other data about how the code should be extracted. Specifying the language seems particularly useful for odoc to allow us to correctly do code highlighting in the resulting documentation. I'm worried that we won't easily be able to do that if we merge this patch as is because we'll already have specified a different semantics for the text that appears between { and [.

dbuenzli · 2019-02-13T12:26:55Z

I can see how this proposal would be better than what I'm suggesting for literate programming, is that what you have in mind?

That's one of the worfklow along with .mld based tutorials. I'm also interested in extracting my code samples from .cmti files and make sure they compile though. If I take these examples I'm not really interested in topexpecting them. Just extracting them with linenum directives and make sure they actually compile in my dev builds (and potientially install them without linenum directives as sample code).

My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language of the code fragment along with other data about how the code should be extracted.

Agreed I wouldn't be against more structure here. FWIW in CommonMark, the id is an "info string". and is it only suggested that the first word should specify the language. See here.

lpw25 · 2019-02-13T13:25:32Z

If I take these examples I'm not really interested in topexpecting them

The expect test style approach wouldn't give you anything extra, but it would give you what you want in these cases. So if we assume that such support is on the way then I think you could support the extract command only for .mld files and you would have all the things you are after.

dbuenzli · 2019-02-13T14:18:14Z

Not sure I understand the deep reason for objecting doing this on cmti files aswell :-) Presumably it's the same code path.

lpw25 · 2019-02-13T15:41:58Z

I just want to discourage people from trying to use it to implement the execution of code in comments in mli files since I don't think it is the best way to do it. I'd prefer it if odoc only had commands for things that have a clear use case and where the command is the best approach for addressing that use case.

Whereas the mld version has a clear use case in literate programming and, as you said, odoc is the main tool that understands mld files so it is the logical place to put the feature.

rizo · 2019-02-13T19:10:15Z

I have pushed an early attempt to document this feature (see draft version here). It mostly focuses on user experience and integration with dune. I understand this is might be significantly out of odoc's scope but, as I mentioned previously, it is not my intention to exclusively focus on low-level odoc-specific interfaces. It is not clear (to me) how all the discussed details (extraction, execution, line numbers, test promotion, etc) connect together into something users can use. And that is my focus.

Let me know if it is preferred to move this discussion to dune.

CC @rgrinberg: you might have an opinion about this.

So is the idea that odoc would extract the code samples and then some other mdx-like tool would call this feature of odoc as part of its processing? That could work, although mostly odoc works on .cmt files whilst I would expect that tool to work directly on mli/mld files.

I think both options could be supported. odoc's parser library is already exposed and it can be used to do the extraction.

Also not all comments have an associated id, so its not clear how you'd want to deal with those.

I covered this in the proposal. Essentially all code blocks without an associated id can be extracted as a common "anonymous" group.

Maybe you could outline how you see this working to give a clearer picture of how this part will be used?

I do not at this stage have a final plan for the fully integrated solution. One of the reasons I submitted this PR as WIP was to start this discussion. And, as you mentioned, annotating code blocks is needed in any case.

I started experimenting with mdx and it seems like it could be used to build a tool (I'm calling it odoc-mdx) that would use odoc's parser as a library to extract the code blocks from mli/mld files, execute them and generate .corrected files. I'm hoping dune can be extended to natively support this workflow.

My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language (...)

This is something I mention in my proposal. Essentially using filenames gives us both: the ability to group code blocks and the language information. It does require odoc to interpret code block annotations as file names, but I think it is a reasonable limitation.

I'm worried that we won't easily be able to do that if we merge this patch as is because we'll already have specified a different semantics for the text that appears between { and [.

By that do you mean "other data about how the code should be extracted"? For OCaml, at least the execution options could be passed as # directives (similar to how toplevel_expect_test works).

Agreed I wouldn't be against more structure here. FWIW in CommonMark, the id is an "info string". and is it only suggested that the first word should specify the language.

Mdx actually relies on this. I think we could adopt this model. How would multiple code blocks be grouped for extraction into the same file though?

Maybe something like the following could work? The extraction command would match on the file value.

{ocaml file=hello.ml[ ... ]}

Not sure I understand the deep reason for objecting doing this on cmti files aswell :-)

I just want to discourage people from trying to use it to implement the execution of code in comments in mli files

I'm not sure I understand what this means. Does this only apply to the CLI? Wouldn't the extraction from cmti (or mli) files be required to implement the expect-based workflow?

dbuenzli

Thanks very much @rizo for the design doc. These things help a lot to clarify things and of course provide the basis for the documentation of the feature.

I have left a few comments but basically it quite matches what I think I would like to have. The only thing I'm a little bit sceptical about is the extraction of anonymous blocks, it will likely often be made of garbage so I wouldn't bother.

Other than that the syntactic bit needs to be clarified, I made a proposal that I think blends well with the current ocamldoc language conventions (think of {:uri}).

dbuenzli · 2019-02-13T19:23:20Z

notes/testable_examples.md

+
+The file names used to annotated code blocks are also used by odoc to decide
+what language should be used for syntax highlighting in the generated HTML. The
+language is decided based on the file name’s extension.


As already said that may not be entirely sufficient (I guess clashes do exist). See below for more on this.

Clashes aside, I think a more structured annotation is needed indeed.

dbuenzli · 2019-02-13T19:25:02Z

notes/testable_examples.md

+
+**Warning:** code blocks without a file name will not have syntax highlighting.
+Once this feature is implemented, the currently used automatic language
+inference should be disabled.


I don't think this is a good idea. Historically code blocks meant OCaml. If you didn't want syntax highlighting you would use a verbatim block ({v v}). For small anonymous snippets it's tedious to always have to specify the language. It's better to simply default to OCaml, that's what the ocamldoc language was designed for.

I generally dislike implicit promotions, but let's respect the history in this case. I updated the proposal to match your suggestion.

dbuenzli · 2019-02-13T19:27:36Z

notes/testable_examples.md

+ Input cmti, cmt, cmi, mli or mld file.
+
+
+ Odoc 11VERSION11 odoc-extract-code(1)


Missing is a way to extract the list of names and their language. Following the current "odoc way" this could well be odoc extract-code-targets which would list each on their line the set of names and their tokens (see below) existing in the object file.

filename token*

This was in my plans but I forgot to include it. Thanks.

dbuenzli · 2019-02-13T19:32:37Z

notes/testable_examples.md

+
+ -o PATH, --output=PATH
+ Output file path. If omitted, the provided NAME will be used.
+ Required for extraction of anonymous code blocks.


For convenience, one could add an --all option which would extracts everything in one go and -o would then denote the destination directory. That way you can distribute a tutorial as an .mld file and simply instruct someone to odoc extract-code --all -o . tutorial.mld to be setup.

For example this gist and associated blog post could well be distributed as a single .mld file (see for example the "# Generate the html page" in the toplevel comment...).

dbuenzli · 2019-02-13T19:35:04Z

notes/testable_examples.md

+| **Anonymous code block** | **Named code block** |
+| ------------------------ | ----------------------------------- |
+| `"{[" <content> "]}"` | `"{" <filename> "[" <content> "]}"` |
+


One thing this proposal doesn't mention but must be clarified is what happens to named code block which are within stop comments (**/**) ... (**/**).

I think these should be included in the file aswell (they allow to hide setup minutiae you may not want to have in the rendered document).

But… do we get these in .mld files ?

Corrected the proposal, let me know if this is what you had in mind.

But… do we get these in .mld files ?

Maybe the extended code block annotation could have an option for that...

dbuenzli · 2019-02-13T19:55:15Z

notes/testable_examples.md

+
+- Should odoc require code block annotations to be filenames with extension?
+ The extension could be used to identify the language and correctly do code
+ highlighting. On the other hand the code blocks could be annotated only with


Maybe one way would be to simply have these two forms

{[:filename token* [ ... ]} {[ token* [ ... ]}

without more constraints than that.

With filename a syntax that allows a relative file path (that way an .mld file can specify a source file hierarchy) and token arbitrary identifiers that allow the : character.

That way authors and processors are free to collaborate on the meaning they want to give to tokens. Of course odoc should standardize a few behaviour like:

On {[:f.$EXT [...]}, odoc HTML gen will try to highlight the block using the most likely language associated to the extension $EXT (if present).

On {[:f.$EXT lang:$L [...]}, odoc HTML gen will try to highlight using language $L.

On {[ lang:$L [...]}, odoc HTML gen will try to highlight using language $L.

I think this is a reasonable proposal. I also think this should be the focus for the actual work in the current PR. I'll elaborate on this a bit later.

dbuenzli · 2019-02-13T20:23:46Z

notes/testable_examples.md

+
+ --anonymous
+ Extract code blocks without name. Cannot be used with the `--name'
+ option.


Why not. Note however that my guess is that unless the author pays particular attention to her anonymous code block the result will likely be garbage most of the time (you may have code showing errors, or even get different languages...)

Echoing @jsomers desire for certain fileless senarios (though formally you could always simply specify a filename to use by convention). This could be --anonymous=TOKEN, and would extract anonymous code blocks that have the token TOKEN (e.g. ocamlx).

It would also prevent the likely garbage problem I mentioned, if completely unconstrained.

How is it anonymous if it has a name (ocamlx in your example)?

But I do understand your concerns. I will try to address this in the new proposal for code block annotations.

Still regarding anonymous blocks:

The only thing I'm a little bit sceptical about is the extraction of
anonymous blocks, it will likely often be made of garbage so I wouldn't
bother.

It was one of the @lpw25's concerns, I believe. I imagine for large code bases
it would be useful to operate on existing unmarked code blocks in some way. So
if people do want to execute them they might get an opportunity to fix the
"garbage".

How is it anonymous if it has a name (ocamlx in your example)?

In the sense it has no filename.

Basically in the two forms here. The first one is a named code block and the second one is an anonymous one.

I quite like the idea of being able to extract by token, this leaves a lot of flexibity to authors to attribute meaning to what they want to extract.

I also like the idea of token annotations. Without wanting to commit to a concrete syntax, something like {ocaml id=BLOCK_ID[...]} could work, where BLOCK_ID could be anything including a file name. This of course would be used during extraction. I'd make the --output option required though, to clarify that the id is not necessarily a file name.

I still think it's preferable to have to direct support for filenames.

jsomers · 2019-02-13T21:53:30Z

Thanks @rizo for putting together this design. It jibes with what I've been thinking about, though I think there is one key difference.

Should odoc require code block annotations to be filenames with extension?

I don't think so. For an mli file, the workflow I imagined was:

In a doc comment that includes a code example you'd like to execute, you add a small annotation, either {ocaml| or {x| (for "execute" -- we can assume the block is OCaml; perhaps you could have another token to specify the language).
In your dune file in the mli's directory, you specify that you want to execute marked code blocks.
If your code blocks contain toplevel syntax, like

# 1 + 1
- : int = 2

then you go through the diff-and-promote workflow. (That is, if the output is incorrect, you generate a .corrected file and allow the user to easily diff against and accept this correction.) There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately. (Toplevel_expect_test and mdx both do this.)

Otherwise, if you're not dealing with toplevel-style code, you simply throw a compiler error if the code block doesn't compile.

Note that this workflow is almost exactly the same as processing a markdown file with mdx, except that instead of executing the code in triple-backtick blocks, you're executing the code in {ocaml| or {x| blocks.

For mld files, the flow would be basically identical to mdx, except that instead of the base markup language being Markdown, it would be the markup language used by odoc.

By doing things this way, without filenames, you do give up the ability to write code in one block that references input from files extracted from other code blocks -- but I don't think that's nearly as important as being able to simply run small self-contained examples that use the library you're documenting.

As the user of the tool, I don't want to have to think about the code in my doc comments being extracted out into a separate file where it's then executed; I just want the code executed. If the tool needs to extract my code to separate files under the hood, that's fine, but I don't think that detail should be exposed to the user. The user (at least this user) just wants to say "execute this block" on some blocks.

dbuenzli · 2019-02-13T22:17:03Z

@jsomers I don't think what is described here prevents from implementing a system that does exactly what you want. What is described here is rather the plumbing and a more general mecanism that allows you to implement the system you would like as a special case.

There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately.

Extracting with line directives is precisely what will allow the underlying tool to figure out where in the mli/mld file the code you are processing lived and, for example if you compile the extraction, to actually report any compilation error in the original file and location rather than in the extracted file.

jsomers · 2019-02-13T22:40:30Z

@dbuenzli

There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately.

Extracting with line directives is precisely what will allow the underlying tool to figure out where in the mli/mld file the code you are processing lived and, for example if you compile the extraction, to actually report any compilation error in the original file and location rather than in the extracted file.

Ah, I see -- my worry is that we would require the end user to add line-number directives by adding a special notation. But it sounds like you're saying we wouldn't.

dbuenzli · 2019-02-13T22:53:11Z

Ah, I see -- my worry is that we would require the end user to add line-number directives by adding a special notation. But it sounds like you're saying we wouldn't.

No, that would be absolutely terrible :-) Basically the lineno stuff would work as as follows. Suppose you have your file.mld:

Bla bla
{ocamlx[
let f x = x
]}
Ho Ho
{ocamlx[ 
let () = print_endliiine "bla.ml"
]}

Invoking odoc extract-code --with-line-numbers --anonymous=ocamlx -o ocamlx.ml file.mld would produce the ocamlx.ml file:

#3 "file.mld" 
let f x = x
#7 "file.mld"
let () = print_endliiine "bla.ml"

Now you can try for yourself to compile that with ocamlc that's what gets reported:

> ocamlc ocamlx.ml
File "file.mld", line 7, characters 9-24:
Error: Unbound value print_endliiine
Hint: Did you mean print_endline?

aantron · 2019-02-13T23:43:25Z

notes/testable_examples.md

+
+Library authors are encouraged to include examples and short snippets of code
+in documentation to demonstrate how to effectively use their library. Such code
+snippets are included in docstrings as code blocks and therefor cannot be


s/therefor/therefore

aantron · 2019-02-13T23:45:17Z

notes/testable_examples.md

+snippets are included in docstrings as code blocks and therefor cannot be
+executed and tested in the same way regular source files are. This leads to
+code duplication for library authors who want to make sure their examples can
+be correctly executed, and to out of date examples when they forget to update


"out-of-date" for the adjective.

aantron · 2019-02-13T23:45:53Z

notes/testable_examples.md

+
+To address this problem odoc implements the ability to extract code blocks from
+documented interfaces and documentation pages (`mli` and `mld` files
+respectively) into source code files. With this build systems can implement


s/With this/With this,

aantron · 2019-02-13T23:46:36Z

notes/testable_examples.md

+
+## Named code blocks
+
+In the new version of odoc code blocks can be annotated with a file name. This


s/odoc/odoc,

aantron · 2019-02-13T23:52:22Z

notes/testable_examples.md

+Both named and anonymous code blocks can be extracted by odoc via the
+command-line interface. Code blocks with the same file name in a given
+documentation file will be concatenated and written into a file with that name.
+Optionally a different output file name for a given group can be provided.


s/Optionally/Optionally,

aantron · 2019-02-13T23:54:48Z

notes/testable_examples.md

+number directives](https://caml.inria.fr/pub/docs/manual-ocaml/lex.html#sec86)
+in the OCaml manual).
+
+**Note**: unrelated code blocks do not need to have a unique file name, it is


Does this mean they will all go into one file example.ml? If so, this should read something like "unrelated code blocks can be put into the same file, and so don't need unique file names." However, why would it be good to put unrelated examples into one file?

However, why would it be good to put unrelated examples into one file?

If an interface has hundreds of functions each one of them with examples, will users want to provide a unique name to each code block to get code execution and syntax highlighting?

It is clear that filenames are not a satisfactory way to annotate code blocks, so this needs to be changed in the proposal anyway.

lpw25 · 2019-02-14T08:26:43Z

Thanks for the design doc @rizo that makes your plan much clearer. I think the aim is a good one but in general I think that the proposed extract-code command is not the right approach to implement what you are aiming for. From your comments I think you already know what the disadvantages and alternatives are, but I'll make them explicit here anyway:

The need to talk about external files is an unnecessary burden for users. In the common case they simply want to execute all the comments in a single environment. This behaviour should be the default. If they wish to keep some comments separate there should be an optional argument like the env argument in mdx to specify named environments with no requirement that they be unique filenames, something like:

{ocaml env=foo[
...
]}

If they wish not to run a particular comment there should be an optional argument for that to:

{ocaml executed=false[
...
]}

The dune support should simply be something like (execute-examples) in the documentation stanza. Having to manage the extraction of multiple files is an unnecessary burden on the user.

The extract-code command is not sufficient to implement the "promote" aspect robustly. It provides no mechanism to pass arguments about how the code should be run to the actually testing tool -- for example mdx's non-deterministic and version options on code snippets would be useful. Line directives are not a robust means of communicating when blocks start and end -- if I put a line directive in my comment fragments the tool will simply break. To do this kind of thing robustly you would need some quoting mechanism and to use a format other than just an .ml file.

The extract-code command is also not necessary to implement the "promote" aspect. The comment parser is already available and the tool can simply use that on the parsetree of the mli file directly.

I started experimenting with mdx and it seems like it could be used to build a tool (I'm calling it odoc-mdx) that would use odoc's parser as a library to extract the code blocks from mli/mld files, execute them and generate .corrected files. I'm hoping dune can be extended to natively support this workflow.

This is I think a better approach than extract-code and I think it would be better to focus on this.

Maybe something like the following could work? The extraction command would match on the file value.

{ocaml file=hello.ml[ ... ]}

I think that would be better for implementing extract-code. As I said above, extract-code on mld files is as useful command for literate programming and doing it this way will prevent it from getting in the way of the mdx-based alternative.

dbuenzli · 2019-02-14T10:34:45Z

I think that would be better for implementing extract-code. As I said above, extract-code on mld files is as useful command for literate programming and doing it this way will prevent it from getting in the way of the mdx-based alternative.

I'm not exactly sure how something generic like this gets in the way of mdx-based alternatives. Let's seperate concerns here:

Basically the {[]} has to be extended in a useful way. So let's agree on the syntax. What's your proposal ? Simply a list of uninterpreted tokens ? I'm fine with this aswell though we should be careful that such a token allows to express a file path.
Regarding extract-code if you think it's not useful for you and you can do better using an external tool then by all means go build that tool. Personally I have quite a few uses for it and I would like to see it implemented in some way as proposed here.

lpw25 · 2019-02-14T11:21:01Z

These samples are full executables.

Thanks that makes things clearer. So you have some literate programming within the mli files of a library. That's a reasonable use case. My preference would be to have extract-code take the .mli file directly as input, since this should really be an operation on source files rather than installed files. It shouldn't be much harder to implement than the cmti version -- note that dune already takes care of preprocessing the mli file to a (marshalled) .mli file so you don't need to deal with -pp -- but if that is not the case then the .cmti file based version is a reasonable alternative.

dbuenzli · 2019-02-14T11:48:36Z

I agree overall that the default behavior should be execute all ocaml code blocks without any extra configuration.

I don't think this concerns this actual proposal but I would rather not do this and rely on a special token for that (e.g. ocamlx).

You still need a bit of ordering care when you start executing code blocks. When I don't want to care about this I prefer not having to care... by which I mean that having to specify an exec=false would mean I would have to care. For example I want to be able to simply add an ocaml code block anywhere in the document without having to think about it or have to define all its free variables so that it can get executed.

For all it's awesomeness and shininess I still think that executing code blocks in API docs remains quite a limited tool as soon as you escape the realm of documenting basic data structures.

rizo · 2019-02-14T11:51:18Z

The need to talk about external files is an unnecessary burden for users. In
the common case they simply want to execute all the comments in a single
environment. This behaviour should be the default. If they wish to keep some
comments separate there should be an optional argument like the env (...)

I added this to the requirements in the proposal. To be clear the default behavior in my current proposal is to execute all code blocks that have a file name with the ml extension. Users don't have to think about extraction at all (the extract_code option in dune is optional, and so is execute_code).

I do agree that annotating code blocks with the language and allowing arguments like env is a better approach. I don't see why different environments couldn't be extracted into files though, if users really want that. The two uses cases being: (1) using other code blocks as data files for execution and (2) installing examples for package distribution. Use case 1 would be done by the mdx-style tool (i.e. it would extract the data files before execution), and use case 2 would be done by dune (or other build systems). I think this applies to both mld and mli files.

Non of this has to be implemented for the initial version, of course. We can start with a simple mdx-style tool that executes ocaml code blocks (potentially with the x annotation) and ignores all others. At this stage extraction for mld files can be implemented to support self-contained literate programming documents described by @dbuenzli. Finally, the extraction could be implemented for mli files, along with dune support.

I'll try to clarify this in my updated proposal.

rizo · 2019-02-14T11:53:44Z

@lpw25:

The extract-code command is also not necessary to implement the "promote" aspect.

I agree. As long as the code blocks are properly annotated with language and
execution options, the complementary tooling can implement the promotion
workflow without the extract-code command.

rizo · 2019-02-14T12:00:02Z

I don't think this concerns this actual proposal but I would rather not do this and rely on a special token for that (e.g. ocamlx).

I personally don't like the ocamlx as an annotation. If we want to annotate code blocks with language identifiers they should be known language identifiers. We would need to do the same for reason syntax, by the way, and having reasonx or rex doesn't look good to me.

But yeah, executing everything unconditionally is also quite extreme. I'll have this in mind and will make a proposal for code block annotations that allow this kind of flexibility.

lpw25 · 2019-02-14T12:00:11Z

You still need a bit of ordering care when you start executing code blocks. When I don't want to care about this I prefer not having to care... by which I mean that having to specify an exec=false would mean I would have to care.

I suspect that the best thing to do here is to let the default be specified. I would expect both extract-code and the mdx-like tooling to take an option that said whether to assume exec=false or exec=true as the default.

rizo · 2019-02-14T12:06:18Z

I suspect that the best thing to do here is to let the default be specified.

Then it's not really a default, in that case ;-).

But I agree that this could be configured on a level higher than a code block. The question is more about what would be the default in case users don't provide this option.

dbuenzli · 2019-02-14T12:09:48Z

I suspect that the best thing to do here is to let the default be specified.

I suspect this is really not a good idea. It's always better to keep documents self-describing to the readers. If you do this then I must hunt in the build system what default you choose and then remember it in my head when I read a code block.

I would really go for not executable by default and maybe simply {$LANG exec[ ... ]} for those blocks that you want to execute. That works with any language, is clear to the reader and not hard to type.

lpw25 · 2019-02-14T12:18:57Z

There are going to be plenty of people who want to test all their code in comments with "odocx" and they're going to be pretty annoyed with having to write ocaml exec=true hundreds of times.

dbuenzli · 2019-02-14T12:29:36Z

ocaml exec=true hundreds of times.

That's the reason why I proposed ocaml exec ;-)

dbuenzli · 2019-02-14T12:38:08Z

(even {exec[]} since code blocks default to OCaml)

XVilka · 2019-06-18T05:54:58Z

There are conflicts now, so should be rebased.

thomassa · 2020-03-01T22:38:30Z

test/html/cases/markup.mli

@@ -88,6 +88,14 @@
 ignore foo
 ]}

+ Code blocks can have an identifier attached to them:
+
+ {ocaml[


Misleading example? Using ocaml as the identifier makes it looks like an annotation specifying the language, rather than an identifier to enable references to this specific block from elsewhere.

Suggested change

{ocaml[

{print-two-plus-two[

The current definition of annotations needs to be changed in my opinion as it has some limitations. Unfortunately I haven't been available to work on this in the last few months.

To clarify the misunderstanding here: the code block annotation is not used to reference a specific block from elsewhere (it is not a unique identifier). It annotates a class of blocks to be extracted. It could be a language or a filename, or any other user-defined "class".

Not sure if this is helpful, but let me know if you have any other questions or suggestions.

dbuenzli · 2022-02-11T08:37:04Z

Could we move on this ?

Extend the language to support identifiers in code blocks.

69925af

Add a draft proposal for the testable examples feature.

5a001e0

dbuenzli reviewed Feb 13, 2019

View reviewed changes

aantron reviewed Feb 14, 2019

View reviewed changes

Add new requirements. Fix typos/punctuation.

ff3b025

This was referenced Feb 14, 2019

Add support for odoc realworldocaml/mdx#17

Closed

Support for literate programming ocaml/dune#895

Closed

aantron force-pushed the master branch from 210be29 to 4e328c5 Compare July 12, 2019 10:05

XVilka mentioned this pull request Aug 29, 2019

Deprecate argot documentation BinaryAnalysisPlatform/bap#981

Closed

dbuenzli mentioned this pull request Sep 1, 2019

Should nested comment+code_block be accepted ? #375

Closed

dbuenzli mentioned this pull request Oct 9, 2019

Set language in code blocks #380

Closed

thomassa reviewed Mar 1, 2020

View reviewed changes

aantron force-pushed the master branch 4 times, most recently from 44dd495 to cc1f437 Compare May 1, 2020 15:50

dbuenzli mentioned this pull request May 31, 2020

[Doc] Improve API documentation with small examples ocsigen/js_of_ocaml#1020

Open

dbuenzli mentioned this pull request Mar 11, 2021

Syntax for metadata on code blocks #633

Closed

dbuenzli mentioned this pull request Mar 1, 2022

doc/tutorial.md: fixed tiny typo dbuenzli/cmdliner#137

Closed

dbuenzli mentioned this pull request Jan 13, 2023

Typo in Getting Started dbuenzli/cmdliner#155

Closed

dbuenzli mentioned this pull request Mar 10, 2023

Dynarrays, boxed ocaml/ocaml#11882

Merged

5 tasks

EmileTrotignon added the after-2.4 label Dec 5, 2023

		Input cmti, cmt, cmi, mli or mld file.


		Odoc 11VERSION11 odoc-extract-code(1)


		## Named code blocks

		In the new version of odoc code blocks can be annotated with a file name. This

[WIP] Extract code blocks by ID #303

Are you sure you want to change the base?

[WIP] Extract code blocks by ID #303

Conversation

rizo commented Feb 12, 2019

pmetzger commented Feb 12, 2019

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019

lpw25 commented Feb 13, 2019

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019 • edited

lpw25 commented Feb 13, 2019 • edited

dbuenzli commented Feb 13, 2019

lpw25 commented Feb 13, 2019

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019 • edited

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019 • edited

lpw25 commented Feb 13, 2019

dbuenzli commented Feb 13, 2019 • edited

lpw25 commented Feb 13, 2019

rizo commented Feb 13, 2019 • edited

dbuenzli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbuenzli Feb 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbuenzli Feb 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbuenzli Feb 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rizo Feb 14, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsomers commented Feb 13, 2019

dbuenzli commented Feb 13, 2019 • edited

jsomers commented Feb 13, 2019 • edited

dbuenzli commented Feb 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpw25 commented Feb 14, 2019

dbuenzli commented Feb 14, 2019

lpw25 commented Feb 14, 2019

dbuenzli commented Feb 14, 2019

rizo commented Feb 14, 2019 • edited

rizo commented Feb 14, 2019

rizo commented Feb 14, 2019 • edited

lpw25 commented Feb 14, 2019

rizo commented Feb 14, 2019

dbuenzli commented Feb 14, 2019

lpw25 commented Feb 14, 2019

dbuenzli commented Feb 14, 2019

dbuenzli commented Feb 14, 2019

XVilka commented Jun 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbuenzli commented Feb 11, 2022

dbuenzli commented Feb 13, 2019 •

edited

lpw25 commented Feb 13, 2019 •

edited

dbuenzli commented Feb 13, 2019 •

edited

dbuenzli commented Feb 13, 2019 •

edited

dbuenzli commented Feb 13, 2019 •

edited

rizo commented Feb 13, 2019 •

edited

dbuenzli Feb 13, 2019 •

edited

dbuenzli Feb 13, 2019 •

edited

dbuenzli Feb 13, 2019 •

edited

rizo Feb 14, 2019 •

edited

dbuenzli commented Feb 13, 2019 •

edited

jsomers commented Feb 13, 2019 •

edited

dbuenzli commented Feb 13, 2019 •

edited

rizo commented Feb 14, 2019 •

edited

rizo commented Feb 14, 2019 •

edited