Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance boost related to memory allocation #85

Open
digizeph opened this issue Feb 23, 2023 · 0 comments
Open

performance boost related to memory allocation #85

digizeph opened this issue Feb 23, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@digizeph
Copy link
Member

digizeph commented Feb 23, 2023

Originally posted by @jmeggitt in #81 (comment)

I thought it might be interesting to do a profile to see which parts actually have the largest impacts on performance.

The setup was fairly simple. I just wrote a simple test program which parsed the first 5 million entries from a table dump then exited. This was then compiled in release mode with debug symbols using bgpkit-parser 6055612. I used Intel VTune to perform the profile and it gave me the following results.

use std::hint::black_box;

fn main() {
    let start_time = Instant::now();
    let parser = BgpkitParser::new("C:\\Users\\Jasper\\Downloads\\bview.20220911.0800.gz").unwrap();
    let mut count = 0;
    for elem in parser {
        black_box(elem);
        count += 1;
        if count == 5000000 {
            break
        }
    }
    println!("Elapsed: {:?}", start_time.elapsed());
}

Here was the result of that run. I included the image for context, but much of it is unreadable without clicking the various segments.
image
Here are a couple of the parts I found interesting:

  • Elementor::record_to_elems took up 11.9% of the total CPU time, but the vast majority (67.7%) of that time was spent waiting on the system allocator. From a quick glance, all of these cases involved using Vec.
  • The function that took the most CPU time (42.0%) was ReadUtils::read_nlri_prefix. This is not that surprising given the type of file being parsed, but it looks like there are a number of ways that this could be improved.
  • 26.1% of the entire application runtime was spent to allocate/free memory.

Because viewing a table dump leads to somewhat biased results, I also ran it again on one of the largest updates files I could find for rcc15 (updates.20230124.0750.gz, 31MB). The test code was exactly the same except for switching out the file path.
image
In this case, the majority (59.1%) of the CPU time was spent allocating and freeing memory using the system allocator. This is a bit alarming since it means more time was spent waiting on allocations then actually performing any meaningful processing. An additional 7.8% of the CPU time was spent using memcpy. It is a bit harder to tell if memcpy is being overused, but roughly a third of that seems to involve stuff being cloned in bgp_update_to_elems.

An easy way to get a sizable performance boost might be to use a crate like smallvec, tinyvec, or arrayvec. With some slight variations, they all provide vec-like data structures that reserve a certain amount of space on the stack before allocating space on the heap. This could have a massive impact on performance for cases where you need the flexibility of a Vec, but know than in most cases it will only hold a small number of elements. In fact, if you enable the union feature for smallvec it can use the space a Vec would normally use for the base pointer and capacity to start storing values instead. This means that if total number of items placed on the stack before moving to the heap totals to less than 2 machine words (16 bytes on x64) then it will be the exact same size as a Vec would be minus the heap.

@digizeph digizeph added this to the V0.10 milestone Feb 23, 2023
@digizeph digizeph added the enhancement New feature or request label Mar 9, 2023
@digizeph digizeph removed this from the V0.10 milestone Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant