Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Replace Contains Key with Entry #4483

Open
wants to merge 20 commits into
base: next
Choose a base branch
from

Conversation

ASuciuX
Copy link
Contributor

@ASuciuX ASuciuX commented Mar 5, 2024

Benchmark Results

hyperfine -w 3 -r 10 "./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000"
Benchmark 1: ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.977 s ±  0.060 s    [User: 7.584 s, System: 0.365 s]
  Range (min … max):    7.934 s …  8.105 s    10 runs
 
Benchmark 2: ./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.918 s ±  0.024 s    [User: 7.524 s, System: 0.364 s]
  Range (min … max):    7.891 s …  7.965 s    10 runs
 
Benchmark 3: ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.946 s ±  0.012 s    [User: 7.554 s, System: 0.362 s]
  Range (min … max):    7.925 s …  7.966 s    10 runs
 
Summary
  ./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000 ran
    1.00 ± 0.00 times faster than ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
    1.01 ± 0.01 times faster than ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000

The benchmark above showcase the performance of data for the following:

  1. default run
  2. only update mutable occurrences
  3. update mutable and immutable as well

The best benchmark is the 2nd case, with updated mutable occurrences and default immutable ones.

Applicable issues

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.41%. Comparing base (25311e4) to head (283f46b).
Report is 14 commits behind head on next.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #4483      +/-   ##
==========================================
- Coverage   85.25%   83.41%   -1.85%     
==========================================
  Files         471      471              
  Lines      337762   337762              
  Branches      317      317              
==========================================
- Hits       287955   281728    -6227     
- Misses      49799    56026    +6227     
  Partials        8        8              
Files Coverage Δ
clarity/src/vm/ast/traits_resolver/mod.rs 98.48% <100.00%> (ø)
clarity/src/vm/costs/mod.rs 81.78% <100.00%> (+0.01%) ⬆️
clarity/src/vm/types/mod.rs 91.70% <100.00%> (-0.01%) ⬇️

... and 65 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25311e4...283f46b. Read the comment docs.

clarity/src/vm/ast/traits_resolver/mod.rs Outdated Show resolved Hide resolved
clarity/src/vm/ast/traits_resolver/mod.rs Outdated Show resolved Hide resolved
- fallback to `contains_key()` in places where inserts aren't made
- swap `get() ... insert()` on maps with `get_mut() += value` where available
@ASuciuX
Copy link
Contributor Author

ASuciuX commented Mar 8, 2024

I've run the benchmark again after the latest commit (a8597f1), compared to the previous one (885b6e5). The results show a decrease of 2% in block processing time, above the initial 1%:

Benchmark 1: ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      6.694 s ±  0.011 s    [User: 6.310 s, System: 0.350 s]
  Range (min … max):    6.680 s …  6.711 s    10 runs
 
Benchmark 2: ./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      6.587 s ±  0.009 s    [User: 6.227 s, System: 0.325 s]
  Range (min … max):    6.573 s …  6.606 s    10 runs
 
Summary
  ./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000 ran
    1.02 ± 0.00 times faster than ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000

In this case, Benchmark 1 is 885b6e5, and Benchmark 2 is a8597f1.

Copy link
Contributor

@jbencin jbencin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor changes left to make and I'll approve

clarity/src/vm/database/key_value_wrapper.rs Outdated Show resolved Hide resolved
stackslib/src/net/dns.rs Outdated Show resolved Hide resolved
stackslib/src/net/dns.rs Outdated Show resolved Hide resolved
testnet/stacks-node/src/chain_data.rs Outdated Show resolved Hide resolved
stackslib/src/net/inv/nakamoto.rs Outdated Show resolved Hide resolved
stackslib/src/net/dns.rs Outdated Show resolved Hide resolved
stackslib/src/net/dns.rs Outdated Show resolved Hide resolved
stackslib/src/net/dns.rs Outdated Show resolved Hide resolved
jbencin
jbencin previously approved these changes Mar 11, 2024
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment.


EDIT: I see my email comment didn't land on this PR for some reason. Here's what it meant to say:

I'm really not a fan of the entry API. It takes several (sometimes 10x) lines of code to express that which can be done in a single line with contains_key(), so code legibility and density are things that would be sacrificed by merging this PR. Moreover, to use this API, you needed to manually transform lots of code on consensus-critical code paths, which makes merging this PR super risky. Finally, the speed improvements are basically nonexistent (a questionable 1% speedup?), which means the wins from the PR are basically nonexistent.

As is, I'm a NACK.

@jbencin
Copy link
Contributor

jbencin commented Mar 11, 2024

It takes several (sometimes 10x) lines of code to express that which can be done in a single line with contains_key()

This PR is currently at +275, −247. So there is an increase in line count, but it's pretty small

Moreover, to use this API, you needed to manually transform lots of code on consensus-critical code paths, which makes merging this PR super risky

It's a simple and straightforward transformation, similar to transforming if statements into match statements. It also let's us remove some ugly .expect() calls like this:

let resp = self
    .requests
    .remove(&req)
    .expect("BUG: had key but then didn't")
    .expect("BUG: had response but then didn't");

IMO this makes the entry API safer to use than indexing a HashMap multiple times with other methods

The unit and integration tests are currently passing, but we can also do some extra testing here

Finally, the speed improvements are basically nonexistent (a questionable 1% speedup?), which means the wins from the PR are basically nonexistent

The numbers in the description are incorrect, it's about a 2% speedup in block processing. See the second comment on this issue

@jcnelson
Copy link
Member

jcnelson commented Mar 11, 2024

It's a simple and straightforward transformation, similar to transforming if statements into match statements. It also let's us remove some ugly .expect() calls like this:

I'm really not seeing how this:

match foo.get(&bar).entry() {
   Entry::Occupied(value) => {
      /* do something */
   }
   Entry::Vacant(e) => {
      /* do something else */
   }
}

is a legibility improvement over this:

if foo.contains_key(&bar) {
   /* do something */
} else {
   /* do something else */
}

The former has two levels of indentation, whereas the latter has one. Also, the former has 6 lines of boilerplate, whereas the latter has 3. We've been trying to reduce gratuitous indentations for the past few years because it aids in comprehensibility, but this PR works against it.

Do we know for sure that using the Entires API is required to gain that 2% speed improvement? Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so).

@jbencin
Copy link
Contributor

jbencin commented Mar 11, 2024

I'm really not seeing how this... is a legibility improvement over this

I don't know about that. Sure the second example is less text, but the first is more explicit because both cases are clearly labeled, rather than using an else for one case. This example also misses that when using Entry, the code inside these blocks can be much more readable because there's no need to access the HashMap again. Consider this example from the PR:

if !self.prune_inbound_counts.contains_key(prune) {
    self.prune_inbound_counts.insert(prune.clone(), 1);
} else {
    let c = self.prune_inbound_counts.get(prune).unwrap().to_owned();
    self.prune_inbound_counts.insert(prune.clone(), c + 1);
}

vs.

match self.prune_inbound_counts.entry(prune.to_owned()) {
    Entry::Occupied(mut e) => {
        *e.get_mut() += 1;
    }
    Entry::Vacant(e) => {
        e.insert(1);
    }
}

Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap() needed when using the Entry

Do we know for sure that using the Entires API is required to gain that 2% speed improvement?

The hyperfine stacks-inspect replay-block benchmark we use is very consistent between runs, so I'm confident this represents a performance gain. This is not surprising, it's halving the number of hashes needed in some places.

The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block benchmark

Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so)

You'd think, but during my testing I've been surprised by what the compiler doesn't optimize away at -O3

@jcnelson
Copy link
Member

The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block benchmark

Yup, 100% onboard with this. Now that you mention it, it was surprising that these changes impacted code outside of stacks_common and clarity.

Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap() needed when using the Entry

The example you're citing here could also be refactored as follows, with less boilerplate and the same number of clones:

if let Some(c) = self.prune_inbound_counts.get_mut(prune) {
   *c += 1;
} else {
   self.prune_inbound_counts.insert(prune.clone(), 1);
}

Granted, there's a lot of really old code in the stackslib/src/net directory that was written before I had a good handle on Rust, and it could really use a spit-shine.

@jbencin
Copy link
Contributor

jbencin commented Mar 12, 2024

Also, I think that using if let with .entry() is also the cleanest way to insert a key iff it doesn't exist already. Another example from the PR:

if !lowest_reward_cycle_with_missing_block.contains_key(nk) {
    lowest_reward_cycle_with_missing_block.insert(nk.clone(), reward_cycle);
}

vs.

if let Entry::Vacant(e) = lowest_reward_cycle_with_missing_block.entry(nk) {
    e.insert(reward_cycle);
}

Aside from the Clarity perf gains, there are some clear readability improvements and .unwrap() or .expect() eliminations in this PR that I think are worth keeping as well

@jcnelson
Copy link
Member

I disagree -- I think it requires more cognitive overhead. Instead of reading "okay, insert this thing into the hash table if it doesn't have it", I'm reading "okay, get this opaque pointer-that-is-not-a-pointer thing ("entry") via a let-binding, and then call this deref-and-store-that-is-not-a-deref-and-store function on it ("insert") to put this thing into the hash table". This is why I avoid the Entry API in my code -- I think it creates a needless layer of indirection that can be avoided with the containers' APIs.

That all said, I'm not really keen to pursue the bike-shedding further since I don't think we're going to ever come to agreement. Also, it's off-topic.

The purpose of the workstream to which this PR belongs is to improve the speed of the Clarity VM. A few comments ago, you mentioned that the speedup you see does not appear to come from the introduction of the Entries API (or if it does, there's no need to be touching files outside of clarity and perhaps stacks_common). When viewed only through this lens, without regards to the Entries API, this PR is not achieving this -- it's making changes that do not improve the Clarity VM, and it's making changes in things outside of the Clarity VM. As such, I'm leaving my review as-is until these two problems are addressed.

@ASuciuX ASuciuX self-assigned this Mar 15, 2024
@ASuciuX
Copy link
Contributor Author

ASuciuX commented Mar 15, 2024

@jcnelson I've removed all the modifications outside of clarity package. Can you please review it?

cc: @jbencin

jcnelson
jcnelson previously approved these changes Mar 18, 2024
jbencin
jbencin previously approved these changes Mar 18, 2024
@ASuciuX ASuciuX enabled auto-merge March 19, 2024 13:10
@ASuciuX ASuciuX dismissed stale reviews from jbencin and jcnelson via ab87986 March 22, 2024 17:44
@ASuciuX
Copy link
Contributor Author

ASuciuX commented Mar 22, 2024

@jcnelson @jbencin I've removed the .collect() as it was failing the build, now it should work. Can you please re-approve this?

jbencin
jbencin previously approved these changes Mar 22, 2024
Copy link
Contributor

@jbencin jbencin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was failing to compile because of a merge conflict?

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Mar 22, 2024

I assume this was failing to compile because of a merge conflict?

There were no previous merge conflicts, what I've seen was that the .collect() was previously implemented at the time when i created this PR, but on the latest next it was removed - that's why the compile was failing. I manually solved it today before updating the branch, so I didn't see any conflicts when re-syncing.

@jbencin
Copy link
Contributor

jbencin commented Mar 23, 2024

It looks like it's because I changed probe_for_generics() to accept impl Iterator in #4500

ASuciuX added a commit to ASuciuX/stacks-core that referenced this pull request Mar 26, 2024
Copy link
Member

@cylewitruk cylewitruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've requested (via Slack) that we try to retain the readability of contains_key via a new trait or method, since these changes make the code less obvious as to what it's doing. This is probably something that's done in several other places throughout stacks-core, so having such a util method would likely be beneficial.

@jbencin
Copy link
Contributor

jbencin commented Mar 27, 2024

I've requested (via Slack) that we try to retain the readability of contains_key via a new trait or method

Do you have a suggestion for how to do that?

@cylewitruk
Copy link
Member

I've requested (via Slack) that we try to retain the readability of contains_key via a new trait or method

Do you have a suggestion for how to do that?

Yeah I gave a few to @ASuciuX on Slack -- but if it proves difficult I'm happy to take it.

github-merge-queue bot pushed a commit that referenced this pull request Mar 29, 2024
…ains-key

Track functions using mutants from PR #4483
Comment on lines 67 to 90
match contract_ast.referenced_traits.entry(trait_name.to_owned()) {
Entry::Occupied(_) => {
return Err(ParseErrors::NameAlreadyUsed(
trait_name.to_string(),
)
.into());
}
Entry::Vacant(e) => {
// Traverse and probe for generics nested in the trait definition
self.probe_for_generics(
trait_definition.iter(),
&mut referenced_traits,
true,
)?;

let trait_id = TraitIdentifier {
name: trait_name.clone(),
contract_identifier: contract_ast
.contract_identifier
.clone(),
};
e.insert(TraitDefinition::Defined(trait_id));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced with a more idiomatic/cleaner approach which reduces nesting while keeping the use of Entry and maintaining readability :)

Suggested change
match contract_ast.referenced_traits.entry(trait_name.to_owned()) {
Entry::Occupied(_) => {
return Err(ParseErrors::NameAlreadyUsed(
trait_name.to_string(),
)
.into());
}
Entry::Vacant(e) => {
// Traverse and probe for generics nested in the trait definition
self.probe_for_generics(
trait_definition.iter(),
&mut referenced_traits,
true,
)?;
let trait_id = TraitIdentifier {
name: trait_name.clone(),
contract_identifier: contract_ast
.contract_identifier
.clone(),
};
e.insert(TraitDefinition::Defined(trait_id));
}
}
let entry = contract_ast
.referenced_traits.entry(trait_name.to_owned());
// If the entry is `Occupied` then the name is already in use.
if let Entry::Occupied(_) = entry {
return Err(ParseErrors::NameAlreadyUsed(trait_name.to_string()).into());
}
// Traverse and probe for generics nested in the trait definition
self.probe_for_generics(
trait_definition.iter(),
&mut referenced_traits,
true,
)?;
let trait_id = TraitIdentifier {
name: trait_name.clone(),
contract_identifier: contract_ast.contract_identifier.clone(),
};
// Ensure a value is in the entry
entry.or_insert_with(|| TraitDefinition::Defined(trait_id));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then if you want to get really fancy you could package all of this in something like the following for re-use (note: just wrote this in playground, so it's not thoroughly tested)

use std::collections::{HashMap, hash_map::{Entry, OccupiedEntry, VacantEntry}};

pub trait EntryExt<'a, K: 'a, V: 'a> where Self: 'a {
    fn map_occupied(
        &mut self, 
        f: impl FnOnce(&mut OccupiedEntry<'a, K, V>) -> Result<(), MyError>
    ) -> Result<&mut Self, MyError>;

    fn map_vacant(
        &mut self, 
        f: impl FnOnce(&mut VacantEntry<'a, K, V>) -> Result<(), MyError>
    ) -> Result<&mut Self, MyError>;
}

impl<'a, K: 'a, V: 'a> EntryExt<'a, K, V> for Entry<'a, K, V> where Self: 'a {
    fn map_occupied(
        &mut self, 
        f: impl FnOnce(&mut OccupiedEntry<'a, K, V>) -> Result<(), MyError>
    ) -> Result<&mut Self, MyError>{
        if let Entry::Occupied(e) = self {
            f(e)?;
        }
        Ok(self)
    }

    fn map_vacant(
        &mut self, 
        f: impl FnOnce(&mut VacantEntry<'a, K, V>) -> Result<(), MyError>
    ) -> Result<&mut Self, MyError> {
        if let Entry::Vacant(e) = self {
            f(e)?;
        }
        Ok(self)
    }
}

#[derive(Debug)]
pub enum MyError {
    NameAlreadyUsed(String),
}

fn main() -> Result<(), MyError>{
    let mut map = HashMap::<&str, &str>::new();
    let exists = "asdf";
    map.insert(exists, "");

    map.entry(exists)
        //.map_occupied(|_| Err(MyError::NameAlreadyUsed(exists.to_string())))?
        .map_occupied(|_| { // But we'll use this one instead so we get to the next example :)
            println!("We're in the occupied entry for 'asdf'");
            Ok(())
        })?
        .map_vacant(|_| {
            println!("We're in the vacant entry for 'asdf'");
            Ok(())
        })?;

    let doesnt_exist = "qwerty";
    map.entry(doesnt_exist)
        .map_occupied(|_| Err(MyError::NameAlreadyUsed(exists.to_string())))?
        .map_vacant(|_| {
            println!("We're in the vacant entry for 'qwerty'");
            Ok(())
        })?;

    Ok(())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nitpick here: We should be consistent with Rust naming conventions and use

  • map for infallible conversions
  • and_then for fallible conversions which return a Result

Otherwise this looks cool and would be a nice addition to the StacksHashMap type

Copy link
Contributor Author

@ASuciuX ASuciuX Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll finish the stacking testing and look into this afterwards. Thank you for adding the trait @cylewitruk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ASuciuX here comes an update for the move issue you ran into:

Trait + Impl

pub trait EntryExt<'a, K: 'a, V: 'a, S: 'a>
where
    S: BuildHasher,
    Self: Sized,
{
    fn occupied_and_then<T, F, E>(self, f: F) -> Result<Self, E>
    where
        F: FnOnce(&OccupiedEntry<'a, K, V, S>) -> Result<T, E>;

    fn or_else_try_insert<F, E>(self, default: F) -> Result<&'a mut V, E>
    where
        F: FnOnce() -> Result<V, E>;

    fn vacant_or<E>(self, err: E) -> Result<Self, E>;
}

impl<'a, K: 'a, V: 'a, S: 'a> EntryExt<'a, K, V, S> for Entry<'a, K, V, S>
where
    S: BuildHasher,
    K: Hash
{    
    fn occupied_and_then<T, F, E>(self, f: F) -> Result<Self, E>
    where 
        F: FnOnce(&OccupiedEntry<'a, K, V, S>) -> Result<T, E>
    {
        match self {
            Entry::Occupied(ref e) => {
                f(e)?;
                Ok(self)
            },
            Entry::Vacant(_) => Ok(self)
        }
    }

    fn or_else_try_insert<F, E>(self, default: F) -> Result<&'a mut V, E>
    where
        F: FnOnce() -> Result<V, E>
    {
        match self {
            Entry::Occupied(e) => Ok(e.into_mut()),
            Entry::Vacant(e) => Ok(e.insert(default()?)),
        }
    }
    
    fn vacant_or<E>(self, err: E) -> Result<Self, E> {
        match self {
            Entry::Occupied(_) => Err(err),
            Entry::Vacant(_) => Ok(self)
        }
    }
}

Usage

match (&args[0].pre_expr, &args[1].pre_expr) {
    (Atom(trait_name), List(trait_definition)) => {
        contract_ast.referenced_traits
            .entry(trait_name.to_owned())
            .vacant_or(ParseErrors::NameAlreadyUsed(trait_name.to_string()))?
            .or_else_try_insert(|| {
                self.probe_for_generics(
                    trait_definition.iter(),
                    &mut referenced_traits,
                    true,
                ).map_err(|e| e.err)?;

                let trait_id = TraitIdentifier {
                    name: trait_name.clone(),
                    contract_identifier: contract_ast
                        .contract_identifier
                        .clone(),
                };
                // Note: I wonder why the compiler can't infer this from the `e.err` above .oO
                Ok::<_, ParseErrors>(TraitDefinition::Defined(trait_id))
            })?;
    }
    _ => return Err(ParseErrors::DefineTraitBadSignature.into()),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcnelson I invite you to give your input on the above as well as you had comments on the original impl.

let contract_identifier = QualifiedContractIdentifier::new(
contract_ast.contract_identifier.issuer.clone(),
contract_name.clone(),
match contract_ast.referenced_traits.entry(trait_name.to_owned()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Apr 12, 2024

If there is a more appropriate place for the traits and implements please let me know and I will move them.

@jbencin
Copy link
Contributor

jbencin commented Apr 15, 2024

If there is a more appropriate place for the traits and implements please let me know and I will move them.

If @cylewitruk can merge his StacksHashMap changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types

@cylewitruk
Copy link
Member

cylewitruk commented Apr 16, 2024

If @cylewitruk can merge his StacksHashMap changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types

I might actually suggest starting a new dir stacks_common/src/ext where all "extension" traits can be placed. This pattern can be applied to a lot of repetitive and/or verbose code..

If StacksHashMap can merge then the logic can be placed directly in an impl - this is more of a way to extend the API for non-local types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants