Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stronger types #87

Open
tv42 opened this issue Nov 4, 2023 · 5 comments
Open

Stronger types #87

tv42 opened this issue Nov 4, 2023 · 5 comments

Comments

@tv42
Copy link

tv42 commented Nov 4, 2023

Hi. I started using markdown-rs and noticed I had to add a panic into my code just because everything is a Node, but List can only really contain ListItems. Since 1.x is still in alpha, I figured this would be a good time to speak up.

Have you considered making the AST types stronger? I think List.children should be Vec<ListItem>, and similarly e.g. Strong cannot contain a Paragraph or a Heading..

mdast supports this line of thinking:

and so on.

I should be able to contribute code too, if you're interested and breaking the API is ok.

@wooorm
Copy link
Owner

wooorm commented Nov 6, 2023

Yes, supporting content types of mdast more exactly would be very nice! This is actually my main blocker for v1. If you want to work on that: great!

@tv42
Copy link
Author

tv42 commented Nov 7, 2023

Having spent a day looking at the source, I'm less enthusiastic. Changing the resulting data structure is simple enough, but the parser is heavily built on the idea that all nodes are interchangeable, their correct layout happens mostly by the incoming events not being arbitrary, and there's a lot of mystic stuff there that's not obvious, like the context stack, "virtual steps", links to other events (all of previous, next, content; also seemingly into the future), etc.

I'm still looking into it, but I don't think this change will happen without either

  • rewriting the parser (but not tokenizer)
  • adding a post-processing step that constructs the final tree out of a temporary "anything goes" tree (which seems weird and wasteful)

I'm exploring what the parser would have to look like, but the undocumented details of the events are getting in the way.

How confident are you in the test coverage for the parser? It seems the tests are mostly in terms of to_html, with roughly one AST-structure asserting test per node type.

@wooorm
Copy link
Owner

wooorm commented Dec 8, 2023

yeah, markdown is incredibly complex. And the extension mechanism is also very complex.

The parser is very well tested.
There are few node tests, just the ones for coverage, because the AST is slated to change

@xelatihy
Copy link

I was looking into this for some projects I am working on. I suspect that changing the parser is really hard, and in fact may not be a good idea overall.

An alternative might be to have a "higher-level" AST that is strictly typed, and let the library user decide which one to use. The high-level, strictly typed, AST would be generated from the low-level one that exists now, without changing the parser.

In fact, that is what I am doing in my project. I am trying to model my AST similarly to the pandoc AST types, but adapted with a bit of richer types.

If there is interest in this idea, and if my project works, I would by happy to contribute code.

@wooorm
Copy link
Owner

wooorm commented Dec 21, 2023

AST already exists, and is prevalent across the JavaScript ecosystem, but needs to be better modelled here in Rust: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/mdast/index.d.ts.

Or, what do you mean with “higher-level AST”?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants