Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use reference counted prefix for less lifetimes on params? #28

Open
fakeshadow opened this issue Dec 7, 2022 · 7 comments
Open

use reference counted prefix for less lifetimes on params? #28

fakeshadow opened this issue Dec 7, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@fakeshadow
Copy link

fakeshadow commented Dec 7, 2022

struct Node {
    prefix: Arc<[u8]>
}

struct Param<'v> {
    key: BytesStr,
    value: &'v u8
}

struct BytesStr(Arc<[u8]>);

impl ByteStr {
    fn as_str(&self) -> &str {
        std::str::from_utf8(&self.0[1..]).unwrap()
    }
}

A trade off between less lifetime and reference counted overhead + an extra public type. Current Param(s) type are pretty hard to use if not consumed inline. From what I see most people would do an extra heap allocation to erase the lifetime(s) so in that case this would likely be cheaper.

@ibraheemdev
Copy link
Owner

Wouldn't most people allocate again anyways (if its used in a Path extractor for example)? Unless it used the bytes crate maybe.

@fakeshadow
Copy link
Author

On a further look a zero copy value is also possible in form like this:

struct Param {
    key: BytesStr,
    value: BytesStr
}
impl Node {
    fn at(&self, path: impl Into<ByteStr>) {}
}

This would hit the perf even further when given a str to match with one extra copy and a couple of reference counted slicing on it. But in real world http can possibly be used to hand out an already existing ByteStr to enable zero copy. Related code: https://github.com/hyperium/http/blob/34dc1ccb4786c8026e57404fa5e71275978a55f3/src/uri/path.rs#L13

@dpc
Copy link

dpc commented Sep 11, 2023

I like that Params are zero-copy, and I haven't noticed usability issues with the lifetimes. In my toy astra/mathcing project I barely every write down the lifetimes: https://github.com/dpc/htmx-sorta/blob/5e9d2fe026ad9e2baae20368e59c7237c0671161/src/routes.rs#L64

The only problem I've noticed is the one described in #40 and describe there a way to delay any allocations to the point where actual path arguments are known to exist. In the (I assume common case) where there is None, no allocations need to happen.I use Box there, but Arc<_>, Cow<_> etc would work fine as well.

@ibraheemdev
Copy link
Owner

http/hyper already uses Bytes internally, though not fully exposed in it's public API, so we might still be able to be zero-copy in the common use case.

@fakeshadow
Copy link
Author

It's when you use it in a complicated web framework abstraction the lifetime pollution becomes apparent. This is how matchit is used in axum: https://github.com/tokio-rs/axum/blob/368c3ee08fc3896358d3bd2bfc8cc67f2925c6ef/axum/src/routing/url_params.rs#L24
There are mainly two reason for less practical use of reference based param in this case: Exposing lifetime in public API like Service<Request<'a>> is harder to use for library consumer and typical type map for runtime type casting require Any which requires 'static lifetime.
In xitca I use a fork of matchit with ref count based string type and inline optimized small string which is both easier to use in complicated abstraction and faster with less allocation.

http/hyper already uses Bytes internally, though not fully exposed in it's public API, so we might still be able to be zero-copy in the common use case.

There is a related issue: hyperium/http#484 though it's unlikely to happen before some form of widely used ref count string type exists (either in std or 3rd party).

@ibraheemdev
Copy link
Owner

ibraheemdev commented Sep 12, 2023

I guess the most generic way of supporting this would be to return a Range<usize> for values instead of a string slice and punting the choice of string type to users, though you would still have to re-allocate a Params list. For lifetime-less keys though we'd still have to use some sort of Arc<str> internally.

@dpc
Copy link

dpc commented Sep 12, 2023

It's when you use it in a complicated web framework abstraction the lifetime pollution becomes apparent. This is how matchit is used in axum: https://github.com/tokio-rs/axum/blob/368c3ee08fc3896358d3bd2bfc8cc67f2925c6ef/axum/src/routing/url_params.rs#L24

That doesn't look bad. Seems it comes down to "Params is holding references to stuff and has a lifetime".

Anyway, I agree that Params is not best for general purpose consumption. My observation with #40 is that can do route matching and get a cheap Params, and then you convert it to something else and due to the knowledge of what was parsed it might be cheaper (free in many cases) than cloning path/url upfront.

I thought that maybe matchit should support it internally, but now it occurred to me that it would be already possible to do it externally, if matchit provided some facilities for it (expose some internals, or some methods to iterate over it). What you actually convert it to does not even have to be one and only one thing. You have your favorite short-string library etc., you can use it.

These two effort could be combined. Matching could support internally some conversions to representations that are more useful for general purpose code at the cost of extra allocations etc, and then also enable developers converting to whatever they want as well.

If hyper can expose url as Bytes etc. that would change the situation and make more things possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants