Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap AST and parser #1

Merged
merged 4 commits into from
Nov 18, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clippy.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ standard-macro-braces = [
{ name = "assert", brace = "(" },
{ name = "assert_eq", brace = "(" },
{ name = "assert_ne", brace = "(" },
{ name = "matches", brace = "(" },
{ name = "vec", brace = "[" },
]
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@ All user visible changes to `cucumber-expressions` crate will be documented in t

### Added

- ???
- [Cucumber Expressions] AST and parser. ([#1])

[#1]: /../../pull/1




[Cucumber Expressions]: https://github.com/cucumber/cucumber-expressions#readme
[Semantic Versioning 2.0.0]: https://semver.org
4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ name = "cucumber-expressions"
version = "0.1.0-dev"
edition = "2021"
rust-version = "1.56"
description = "Cucumber Expressions AST and parser."
license = "MIT OR Apache-2.0"
authors = [
"Ilya Solovyiov <[email protected]>",
Expand All @@ -17,3 +18,6 @@ keywords = ["cucumber", "expression", "expressions", "cucumber-expressions"]
include = ["/src/", "/LICENSE-*", "/README.md", "/CHANGELOG.md"]

[dependencies]
derive_more = { version = "0.99.16", features = ["as_ref", "deref", "deref_mut", "display", "error"], default_features = false }
nom = "7.0"
nom_locate = "4.0"
162 changes: 162 additions & 0 deletions src/ast.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
// Copyright (c) 2021 Brendan Molloy <[email protected]>,
// Ilya Solovyiov <[email protected]>,
// Kai Ren <[email protected]>
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! [Cucumber Expressions][1] [AST][2] definitions.
//!
//! See details in the [grammar spec][3].
//!
//! [1]: https://github.com/cucumber/cucumber-expressions#readme
//! [2]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
//! [3]: https://tinyurl.com/cucumber-expr-spec#grammar

use derive_more::{AsRef, Deref, DerefMut};
use nom::{error::ErrorKind, Err, InputLength};
use nom_locate::LocatedSpan;

use crate::parse;

/// [`str`] along with its location information in the original string.
pub type Spanned<'s> = LocatedSpan<&'s str>;

/// Top-level [`cucumber-expression`][3].
///
/// See [`parse::expression()`] for the detailed grammar and examples.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
#[derive(AsRef, Clone, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Expression<Input>(pub Vec<SingleExpression<Input>>);

impl<'s> TryFrom<&'s str> for Expression<Spanned<'s>> {
type Error = parse::Error<Spanned<'s>>;

fn try_from(value: &'s str) -> Result<Self, Self::Error> {
parse::expression(Spanned::new(value))
.map_err(|e| match e {
Err::Error(e) | Err::Failure(e) => e,
Err::Incomplete(n) => parse::Error::Needed(n),
})
.and_then(|(rest, parsed)| {
rest.is_empty()
.then(|| parsed)
.ok_or(parse::Error::Other(rest, ErrorKind::Verify))
})
}
}

impl<'s> Expression<Spanned<'s>> {
/// Parses the given `input` as an [`Expression`].
///
/// # Errors
///
/// See [`parse::Error`] for details.
pub fn parse<I: AsRef<str>>(
input: &'s I,
) -> Result<Self, parse::Error<Spanned<'s>>> {
Self::try_from(input.as_ref())
}
}

/// Single entry of a [`cucumber-expression`][3].
///
/// See [`parse::single_expression()`] for the detailed grammar and examples.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum SingleExpression<Input> {
/// [`alternation`][3] expression.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
Alternation(Alternation<Input>),

/// [`optional`][3] expression.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
Optional(Optional<Input>),

/// [`parameter`][3] expression.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
Parameter(Parameter<Input>),

/// Text without whitespaces.
Text(Input),

/// Whitespaces are treated as a special case to avoid lookaheads and
/// lookbehinds described in the [architecture][1]. This allows parsing to
/// have `O(n)` complexity.
///
/// [1]: https://tinyurl.com/cucumber-expr-spec
Whitespace,
}

/// Allows to match one of [`SingleAlternation`]s.
///
/// See [`parse::alternation()`] for detailed syntax and examples.
#[derive(AsRef, Clone, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Alternation<Input>(pub Vec<SingleAlternation<Input>>);

/// Building block an [`Alternation`].
pub type SingleAlternation<Input> = Vec<Alternative<Input>>;

impl<Input: InputLength> Alternation<Input> {
/// Returns length of capture from `Input`.
pub(crate) fn span_len(&self) -> usize {
self.0
.iter()
.flatten()
.map(|alt| match alt {
Alternative::Text(t) => t.input_len(),
Alternative::Optional(opt) => opt.input_len() + 2,
})
.sum::<usize>()
+ self.len()
- 1
}

/// Indicates whether one of [`SingleAlternation`]s consists only from
/// [`Optional`]s.
pub(crate) fn contains_only_optional(&self) -> bool {
for single_alt in &**self {
if single_alt
.iter()
.all(|alt| matches!(alt, Alternative::Optional(_)))
{
return true;
}
}
false
}
}

/// [`alternative`][3] expression.
///
/// See [`parse::alternative()`] for the detailed grammar and examples.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Alternative<Input> {
/// [`optional`][3] expression.
///
/// [3]: https://tinyurl.com/cucumber-expr-spec#grammar
Optional(Optional<Input>),

/// Text.
Text(Input),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviates from the original grammar by missing the parameter variant.

Need to discuss this on voice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyranron there is a Note: section which says

While parameter is allowed to appear as part of alternative and option in the AST, such an AST is not a valid a Cucumber Expression.

Basically ARCHITECTURE.md describes AST that tries to be wider than Cucumber Expression for some reason (the only I can think of is to make parser implementation simpler and then error on conversion to real Cucumber Expression).
But in reality it doesn't make much sense to me. Especially alternative definition

alternative         = optional | parameter | text
text                = whitespace | ")" | "}" | .

This grammar suggests that alternative may have unescaped whitespaces, which is not true: example. They have to be escaped at least to avoid ambiguity.

Copy link
Member

@ilslv ilslv Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not even EBNF, as I understand, as wikipedia says that repetition is described with {...} and not with (...)*. It looks more like regex, but still has , for concatenation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilslv would you be so kind to make a PR to upstream that adjusts the described grammar to be accurate and precise enough. Because it really bothers: having spec which doesn't reflect reality, while implementations don't follow the spec 😕

Copy link
Member Author

@tyranron tyranron Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilslv

It's not even EBNF, as I understand, as wikipedia says that repetition is described with {...} and not with (...)*. It looks more like regex, but still has , for concatenation.

From your link: * is a repetition, and ( ... ) is grouping. So we have a group repetion here. I don't see any mistakes in that. And in Markdown it has ```ebnf notation 🤷‍♂️

}

/// Allows to match optional `Input`.
///
/// See [`parse::optional()`] for detailed syntax and examples.
#[derive(AsRef, Clone, Copy, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Optional<Input>(pub Input);

/// Allows to match some special `Input` descried by a [`Parameter`] name.
///
/// See [`parse::parameter()`] for detailed syntax and examples.
#[derive(AsRef, Clone, Copy, Debug, Deref, DerefMut, Eq, PartialEq)]
pub struct Parameter<Input>(pub Input);
Loading