Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating to Nom 7 has broken double("NaN") #1384

Closed
StephenWakely opened this issue Sep 9, 2021 · 13 comments · Fixed by #1455 · May be fixed by #1391
Closed

Updating to Nom 7 has broken double("NaN") #1384

StephenWakely opened this issue Sep 9, 2021 · 13 comments · Fixed by #1455 · May be fixed by #1391

Comments

@StephenWakely
Copy link

╰─$ rustc -V                                                                                                                                                                           
rustc 1.54.0 (a178d0322 2021-07-26)

Given the following code:

use nom::{error::Error, number::complete::double};
fn main() {
    println!("{:?}", double::<_, Error<_>>("NaN"));
}

In Nom version 6, this runs and returns:

Ok(("", NaN))

In Nom version 7, this returns:

Err(Error(Error { input: "NaN", code: Float }))

Maybe this is intentional? I'm not seeing anything in the release notes about it, however.

@Stargateur
Copy link
Contributor

#1309:

Does not handle special values (NaN, Infinity).

Personally I do not understand why we take Minimal-Lexical there are soo many con, unfortunatly I see the change when it was already done, Rust 1.55 just landed and have a new float parser that is expected to be fast https://blog.rust-lang.org/2021/09/09/Rust-1.55.0.html#faster-more-correct-float-parsing we should maybe considering use it instead.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Sep 10, 2021

#1309:

Does not handle special values (NaN, Infinity).

Personally I do not understand why we take Minimal-Lexical there are soo many con, unfortunatly I see the change when it was already done, Rust 1.55 just landed and have a new float parser that is expected to be fast https://blog.rust-lang.org/2021/09/09/Rust-1.55.0.html#faster-more-correct-float-parsing we should maybe considering use it instead.

I'm the author of that PR, and the author of minimal-lexical. Geal/nom isn't supporting only v1.55, only v1.48+, but ideally, once the MSRV increases, yes, this dependency should be removed. Currently, the performance gap between 1.48-1.54 versions of float parsing is very notable, so minimal-lexical is meant to patch the gap. Also, NaN parsing isn't what's handled by minimal-lexical, but I can submit a PR here to fix that.

@Alexhuszagh
Copy link
Contributor

Ok I know where this is a bug: the implementation does not check for special values right now, which minimal-lexical doesn't handle internally. I'll submit a PR shortly.

@Stargateur
Copy link
Contributor

I'm concern with perf penalty this do, cause Gael compare bench with different feature it was an unfair benchmark.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Sep 10, 2021

I'm concern with perf penalty this do, cause Gael compare bench with different feature it was an unfair benchmark.

I haven't seen Geal's exact benchmarks, but I can tell you the performance is a fair bit faster prior to 1.55.0 for a relatively comprehensive suite of floats. After 1.55.0, standard should be a bit faster if we don't have to tokenize the float (but this isn't the case), so minimal-lexical should actually still be faster, since core does a lot of duplicate work. This is because there is no support for a partial parser in core.

And, prior to 1.55.0, once again, the algorithm for float parsing was substantially worse, which is why I spent a lot of time implementing that. If after these fixes, you want me to benchmark the difference, I'd be happy to do so.

@Stargateur
Copy link
Contributor

Stargateur commented Sep 10, 2021

My problem is that while I don't want to say your work is not amazing but I would like a more neutral argument, since you are the creator of minimal-lexical and owner of lexical-core your have praise your own crate, that not wrong but that make me vigilant to what you claim, in the original issue your said:

Stable on no_std (coming to lexical-core in v0.8).
Uses the Eisel-Lemire algorithm, providing faster float parsing (coming to lexical-core in v0.8).

But lexical-core is now 0.8, mean your list of pro become small add that I don't really agree with some

Faster compile times.

I don't care rust is slow anyway and cargo check do the job. Also lexical-core claim to be faster now

Stable API with a single exported trait and function.

I don't know much so no opinion I expect lexical-core have similar.

Dependency-free when disabling no_alloc feature.

Why dependency would be a problem specially https://crates.io/crates/lexical-core/0.8.0/dependencies only dependency are internal tool to split crates.

Easier support for custom float formats (may be useful to support parsing JSON-conforming floats, which could be a valid use-case).

double purpose of nom is to parse standard float (I think same of std) so not a pro for nom

Larger version support (minimal-lexical supports 1.31.0+, while lexical-core requires 1.37.0+).

lexical-core 0.8 require 1.51.0, can't say it's not a pro but personally that a meh, the nom 7 would just have require a little more.

And we end with two cons:

Must tokenize the float in Nom.
Does not handle special values (NaN, Infinity).

I add one

much less used and less maintained power

The main pro was benchmark, but what if are the number versus lexical-core 0.8 (obviously with NaN and Infinity)? If the number with the special value handle show lexical-core is better I will see no reason to not pick lexical-core again. If number show minimal-lexical win well nice. (that of course taking in account that nom will be the slowing part of the bench for minimal-lexical)

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Sep 10, 2021

I don't mind either way to use lexical-core vs minimal-lexical (hi 👋👋 I maintain both, and I'm happy to ensure that people don't have to use my crates, which is why I implemented the PR into Rust core, and have made numerous PRs to fast_float (the C++ implementation) and fast-float-rust because I want the ecosystem to win.

My problem is that while I don't want to say your work is not amazing but I would like a more neutral argument, since you are the creator of minimal-lexical your have praise your own crate, that not wrong but that make me vigilant to what you claim, in the original issue your said:

But lexical-core is now 0.8, mean your list of pro become small add that I don't really agree with some

minimal-lexical is different than lexical-core. lexical-core v0.8 is certainly faster to compile than v0.7, but it can't compete with a dedicated, small, isolated crate.

I don't care rust is slow anyway and cargo check do the job. Also lexical-core claim to be faster now

Many people do care about compile times.

I don't know much so no opinion I expect lexical-core have similar.

lexical-core does have a relatively simple API, but once again, it's not possible to have only one function in lexical-core. The high-level API is ~4 parsing functions, although it's relatively moot: the main point is compile times.

Dependency-free when disabling no_alloc feature.

Why dependency would be a problem specially https://crates.io/crates/lexical-core/0.8.0/dependencies only dependency are internal tool to split crates.

Because dependencies add compile time issues? I added libm as a dependency to lexical-core v0.7 for stable on no_std and it alone added 2s to every build time. A lot of work went in to removing libm as a dependency entirely.

Easier support for custom float formats (may be useful to support parsing JSON-conforming floats, which could be a valid use-case).

double purpose of nom is to parse standard float (I think same of std) so not a pro for nom

I mean, sure, but it's also a useful potential feature later.

Larger version support (minimal-lexical supports 1.31.0+, while lexical-core requires 1.37.0+).

lexical-core 0.8 require 1.51.0, can't say it's not a pro but personally that a meh, the nom 7 would just have require a little more.

nom7 has already maintained to supporting v1.48+, and lexical-core 0.8 cannot be re-written to support versions older than 1.51, due to the use of const generics integral to the new implementation.

And we end with two cons:

Must tokenize the float in Nom.
Does not handle special values (NaN, Infinity).

I add one

much less used and less maintained power

minimal-lexical is actually used in a private fork by serde-json, so it's unquestionably more used. The maintainers are the same for both lexical-core and minimal-lexical, and maintenance plans have been put into motion if I cannot maintain.

The main pro was benchmark, but what if are the number versus lexical-core 0.8 (obviously with NaN and Infinity)? If the number with the special value handle show lexical-core is better I will see no reason to not pick lexical-core again. If number show minimal-lexical win well nice.

Sure, but there is a compile time difference. minimal-lexical compiles on my computer in ~0.5s, while lexical-core v0.8 with only the float-parsing is ~1.5s. This is substantially faster than the nearly 3.5s of v0.7, but it's worth considering there's a tradeoff for both cases.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Sep 10, 2021

Anyway, I'm fine with any of the options:

  1. Patching this bug (already doing that) and keeping minimal-lexical.
  2. Moving to fast-float-rust, which is like lexical-core but much leaner (although it currently has a significant slower slow-path algorithm, since it uses a decimal class rather than big-integer arithmetic). I've opened an issue so this can be fixed, and am currently implementing a PR for both core and fast-float-rust to ensure these algorithm improvements benefit everyone.
  3. Moving back to lexical-core, but v0.8 cannot be used which negates many of the upsides of the migration. v0.7 has some notable dependencies, such as arrayvec and libm, which do substantially increase build times. There's also slower algorithms used, which is partially why it was such a major re-write.

@Stargateur
Copy link
Contributor

Stargateur commented Sep 10, 2021

Sure, but there is a compile time difference. minimal-lexical compiles on my computer in ~0.5s, while lexical-core v0.8 with only the float-parsing is ~1.5s. This is substantially faster than the nearly 3.5s of v0.7, but it's worth considering there's a tradeoff for both cases.

I'm not talking about compile time here, but benchmark in actually doing the job. That in my opinion is what matter the most in crate like nom.

nom7 has already maintained to supporting v1.48+, and lexical-core 0.8 cannot be re-written to support versions older than 1.51, due to the use of const generics integral to the new implementation.

you didn't understand what I wanted to said, I mean that nom 7 would have require a most recent rust version if needed. that was not a big deal.

  1. Patching this bug (already doing that) and keeping minimal-lexical.

  2. Moving to fast-float-rust, which is like lexical-core but much leaner (although it currently has a significant slower slow-path algorithm, since it uses a decimal class rather than big-integer arithmetic). I've opened an issue so this can be fixed, and am currently implementing a PR for both core and fast-float-rust to ensure these algorithm improvements benefit everyone.

  3. Moving back to lexical-core, but v0.8 cannot be used which negates many of the upsides of the migration. v0.7 has some notable dependencies, such as arrayvec and libm, which do substantially increase build times. There's also slower algorithms used, which is partially why it was such a major re-write.

  1. Depend is that what gael want for number ? since there is no test for that maybe it's not the case, I think nom should handle the same of what std guarantee to handle. So the EBNF grammar. If people want something specific their will need to do their own parser and using your crate directly.
  2. without good reason I don't see why we should do that
  3. for nom 8 maybe. for nom 7 is too late, but again we need benchmark to choice and bench that take into account all features. Now we use minimal-lexical I will want a good reason to change to lexical-core :p

@Alexhuszagh
Copy link
Contributor

I'm not talking about compile time here, but benchmark in actually doing the job. That in my opinion is what matter the most in crate like nom.

I understood, I pointed out numerous times I don't agree. Although compile time isn't the only factor, minimal-lexical is also fairly performant.

but again we need benchmark to choice and bench that take into account all features. Now we use minimal-lexical I will want a good reason to change to lexical-core :p

I appreciate that. Anyway, I'll do my best to ensure float parsing is viable for everything. In the future, if the new slow-path algorithms are implemented in fast-float-rust, that might be the most logical option.

@Geal
Copy link
Collaborator

Geal commented Sep 25, 2021

so, to sum up here:

  • the new float parser in libstd is great, but as @Alexhuszagh pointed out, it is not compatible with nom's MSRV, so we'll use minimal-lexical for now
  • we could go back to lexical-core, but that could only happen on the next MSRV raise, ie nom 8
  • maybe it's faster with minimal-lexical, or lexical-core, or fast-float-rust. We won't know until an actual benchmark is done

I think we can wait a bit before considering a change of float parsing library.

Personally, I am not fond of arguing about the right way to parse floats in nom: the parsers were added for convenience (and for benchmarks, unfortunately), but there's no guarantee that they would parse exactly like people expect in their formats. So they have to provide a minimal feature set for prototyping, then users can make their own tokenizer and pass to something like minimal-lexical or libstd's new float parser, or rely directly on lexical-core and its various format options.

@StephenWakely
Copy link
Author

For the record, I have worked around this issue in our app so there is no urgency on my part to resolve this. Nor do I have a particularly strong opinion on the correct way it should be done.

@mullr
Copy link

mullr commented Oct 22, 2021

#1421 (comment) works around this issue as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants