Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap AST and parser #1

Merged
merged 4 commits into from
Nov 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clippy.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ standard-macro-braces = [
{ name = "assert", brace = "(" },
{ name = "assert_eq", brace = "(" },
{ name = "assert_ne", brace = "(" },
{ name = "matches", brace = "(" },
{ name = "vec", brace = "[" },
]
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@ All user visible changes to `cucumber-expressions` crate will be documented in t

### Added

- ???
- [Cucumber Expressions] AST and parser. ([#1])

[#1]: /../../pull/1




[Cucumber Expressions]: https://github.com/cucumber/cucumber-expressions#readme
[Semantic Versioning 2.0.0]: https://semver.org
7 changes: 7 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ name = "cucumber-expressions"
version = "0.1.0-dev"
edition = "2021"
rust-version = "1.56"
description = "Cucumber Expressions AST and parser."
license = "MIT OR Apache-2.0"
authors = [
"Ilya Solovyiov <[email protected]>",
Expand All @@ -17,3 +18,9 @@ keywords = ["cucumber", "expression", "expressions", "cucumber-expressions"]
include = ["/src/", "/LICENSE-*", "/README.md", "/CHANGELOG.md"]

[dependencies]
derive_more = { version = "0.99.16", features = ["as_ref", "deref", "deref_mut", "display", "error"], default_features = false }
nom = "7.0"
nom_locate = "4.0"

# TODO: Remove once `derive_more` 0.99.17 is released.
syn = "1.0.81"
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,42 @@ This crate provides [AST] and parser of [Cucumber Expressions].



## Grammar

This implementation follows a context-free grammar, [which isn't yet merged][1]. Original grammar is impossible to follow while creating a performant parser, as it consists errors and describes not an exact [Cucumber Expressions] language, but rather some superset language, while being also context-sensitive. In case you've found some inconsistencies between this implementation and the ones in other languages, please file an issue!

[EBNF] spec of the current context-free grammar implemented by this crate:
```ebnf
expression = single-expression*

single-expression = alternation
| optional
| parameter
| text-without-whitespace+
| whitespace
text-without-whitespace = (- (text-to-escape | whitespace))
| ('\', text-to-escape)
text-to-escape = '(' | '{' | '/' | '\'

alternation = single-alternation, (`/`, single-alternation)+
single-alternation = ((text-in-alternative+, optional*)
| (optional+, text-in-alternative+))+
text-in-alternative = (- alternative-to-escape)
| ('\', alternative-to-escape)
alternative-to-escape = ' ' | '(' | '{' | '/' | '\'

optional = '(' text-in-optional+ ')'
text-in-optional = (- optional-to-escape) | ('\', optional-to-escape)
optional-to-escape = '(' | ')' | '{' | '/' | '\'

parameter = '{', name*, '}'
name = (- name-to-escape) | ('\', name-to-escape)
name-to-escape = '{' | '}' | '(' | '/' | '\'
```




## License

This project is licensed under either of
Expand All @@ -29,3 +65,6 @@ at your option.

[AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
[Cucumber Expressions]: https://github.com/cucumber/cucumber-expressions#readme
[EBNF]: https://en.wikipedia.org/wiki/Extended_Backus–Naur_form

[1]: https://github.com/cucumber/cucumber-expressions/issues/41
174 changes: 174 additions & 0 deletions src/ast.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
// Copyright (c) 2021 Brendan Molloy <[email protected]>,
// Ilya Solovyiov <[email protected]>,
// Kai Ren <[email protected]>
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! [Cucumber Expressions][1] [AST].
//!
//! See details in the [grammar spec][0].
//!
//! [0]: crate#grammar
//! [1]: https://github.com/cucumber/cucumber-expressions#readme
//! [AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree

use derive_more::{AsRef, Deref, DerefMut};
use nom::{error::ErrorKind, Err, InputLength};
use nom_locate::LocatedSpan;

use crate::parse;

/// [`str`] along with its location information in the original input.
pub type Spanned<'s> = LocatedSpan<&'s str>;

/// Top-level `expression` defined in the [grammar spec][0].
///
/// See [`parse::expression()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(AsRef, Clone, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Expression<Input>(pub Vec<SingleExpression<Input>>);

impl<'s> TryFrom<&'s str> for Expression<Spanned<'s>> {
type Error = parse::Error<Spanned<'s>>;

fn try_from(value: &'s str) -> Result<Self, Self::Error> {
parse::expression(Spanned::new(value))
.map_err(|e| match e {
Err::Error(e) | Err::Failure(e) => e,
Err::Incomplete(n) => parse::Error::Needed(n),
})
.and_then(|(rest, parsed)| {
rest.is_empty()
.then(|| parsed)
.ok_or(parse::Error::Other(rest, ErrorKind::Verify))
})
}
}

impl<'s> Expression<Spanned<'s>> {
/// Parses the given `input` as an [`Expression`].
///
/// # Errors
///
/// See [`parse::Error`] for details.
pub fn parse<I: AsRef<str> + ?Sized>(
input: &'s I,
) -> Result<Self, parse::Error<Spanned<'s>>> {
Self::try_from(input.as_ref())
}
}

/// `single-expression` defined in the [grammar spec][0], representing a single
/// entry of an [`Expression`].
///
/// See [`parse::single_expression()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum SingleExpression<Input> {
/// [`alternation`][0] expression.
///
/// [0]: crate#grammar
Alternation(Alternation<Input>),

/// [`optional`][0] expression.
///
/// [0]: crate#grammar
Optional(Optional<Input>),

/// [`parameter`][0] expression.
///
/// [0]: crate#grammar
Parameter(Parameter<Input>),

/// Text without whitespaces.
Text(Input),

/// Whitespaces are treated as a special case to avoid placing every `text`
/// character in a separate [AST] node, as described in the
/// [grammar spec][0].
///
/// [0]: crate#grammar
/// [AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
Whitespaces(Input),
}

/// `single-alternation` defined in the [grammar spec][0], representing a
/// building block of an [`Alternation`].
///
/// [0]: crate#grammar
pub type SingleAlternation<Input> = Vec<Alternative<Input>>;

/// `alternation` defined in the [grammar spec][0], allowing to match one of
/// [`SingleAlternation`]s.
///
/// See [`parse::alternation()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(AsRef, Clone, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Alternation<Input>(pub Vec<SingleAlternation<Input>>);

impl<Input: InputLength> Alternation<Input> {
/// Returns length of this [`Alternation`]'s span in the `Input`.
pub(crate) fn span_len(&self) -> usize {
self.0
.iter()
.flatten()
.map(|alt| match alt {
Alternative::Text(t) => t.input_len(),
Alternative::Optional(opt) => opt.input_len() + 2,
})
.sum::<usize>()
+ self.len()
- 1
}

/// Indicates whether any of [`SingleAlternation`]s consists only from
/// [`Optional`]s.
pub(crate) fn contains_only_optional(&self) -> bool {
(**self).iter().any(|single_alt| {
single_alt
.iter()
.all(|alt| matches!(alt, Alternative::Optional(_)))
})
}
}

/// `alternative` defined in the [grammar spec][0].
///
/// See [`parse::alternative()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Alternative<Input> {
/// [`optional`][1] expression.
///
/// [1]: crate#grammar
Optional(Optional<Input>),

/// Text.
Text(Input),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviates from the original grammar by missing the parameter variant.

Need to discuss this on voice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyranron there is a Note: section which says

While parameter is allowed to appear as part of alternative and option in the AST, such an AST is not a valid a Cucumber Expression.

Basically ARCHITECTURE.md describes AST that tries to be wider than Cucumber Expression for some reason (the only I can think of is to make parser implementation simpler and then error on conversion to real Cucumber Expression).
But in reality it doesn't make much sense to me. Especially alternative definition

alternative         = optional | parameter | text
text                = whitespace | ")" | "}" | .

This grammar suggests that alternative may have unescaped whitespaces, which is not true: example. They have to be escaped at least to avoid ambiguity.

Copy link
Member

@ilslv ilslv Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not even EBNF, as I understand, as wikipedia says that repetition is described with {...} and not with (...)*. It looks more like regex, but still has , for concatenation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilslv would you be so kind to make a PR to upstream that adjusts the described grammar to be accurate and precise enough. Because it really bothers: having spec which doesn't reflect reality, while implementations don't follow the spec 😕

Copy link
Member Author

@tyranron tyranron Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilslv

It's not even EBNF, as I understand, as wikipedia says that repetition is described with {...} and not with (...)*. It looks more like regex, but still has , for concatenation.

From your link: * is a repetition, and ( ... ) is grouping. So we have a group repetion here. I don't see any mistakes in that. And in Markdown it has ```ebnf notation 🤷‍♂️

}

/// `optional` defined in the [grammar spec][0], allowing to match an optional
/// `Input`.
///
/// See [`parse::optional()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(AsRef, Clone, Copy, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Optional<Input>(pub Input);

/// `parameter` defined in the [grammar spec][0], allowing to match some special
/// `Input` described by a [`Parameter`] name.
///
/// See [`parse::parameter()`] for the detailed grammar and examples.
///
/// [0]: crate#grammar
#[derive(AsRef, Clone, Copy, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Parameter<Input>(pub Input);
Loading